CN107566242A - Rubbish mail filtering method based on rule of combination - Google Patents

Rubbish mail filtering method based on rule of combination Download PDF

Info

Publication number
CN107566242A
CN107566242A CN201610821016.8A CN201610821016A CN107566242A CN 107566242 A CN107566242 A CN 107566242A CN 201610821016 A CN201610821016 A CN 201610821016A CN 107566242 A CN107566242 A CN 107566242A
Authority
CN
China
Prior art keywords
rule
combination
spam
filtering
method based
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610821016.8A
Other languages
Chinese (zh)
Inventor
杨首哲
罗朝彤
屈强
庄严
黄睿哲
胡雁淇
王艳
戚国飞
刘再元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Medium shift information technology Co., Ltd.
Original Assignee
POLYTRON TECHNOLOGIES Inc
China Mobile Group Guangdong Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by POLYTRON TECHNOLOGIES Inc, China Mobile Group Guangdong Co Ltd filed Critical POLYTRON TECHNOLOGIES Inc
Priority to CN201610821016.8A priority Critical patent/CN107566242A/en
Publication of CN107566242A publication Critical patent/CN107566242A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a kind of rubbish mail filtering method based on rule of combination, including step:A)Sub-rule is established;B)Rule of combination is established;C)Rule loading and storage;D)Rule match.The present invention develops a kind of new filtering technique on the basis of existing technology, the technology is the rubbish mail filtering method based on rule of combination, the algorithm of this method matching has the semantic interpretation algorithm of regular expression algorithm and the support logical expression of independent research, thus the condition setting of rule of combination and rule combination are very flexible, research staff, which introduces new filtering technique, only to be needed to generate sub-rule, operation maintenance personnel is on the basis of available sub-rule, it is dynamically adapted, spread Cheng Xin filtering rule, for reducing the erroneous judgement of spam and strengthening the filter capacity of spam, the filtering technique of this rule of combination, it is the supplement of conventional filtering techniques, the means of spam interception are also enriched simultaneously, reduce the erroneous judgement of spam, the ability of Spam filtering is improved well.

Description

Rubbish mail filtering method based on rule of combination
Technical field
The invention belongs to filtering refuse E-mail field, more particularly, to a kind of filtering technique based on rule of combination, Pass through a kind of brand-new filter method formed after being sufficiently improved to existing electronic waste mail filtering technology.
Background technology
Spam makes rapid progress, feature is changeable, traditional Spam filtering such as IP mistakes with the development in epoch Filter technology, reverse name resolution, SMTP FREQUENCY CONTROLs, user's black and white lists, keyword filtration, Mail Contents filter, are black in real time List, technology, these technologies such as user behavior filtering, rule scoring all have respective advantage and defect, can not be absolutely Accurate judgement.
The content of the invention
The present invention is formed on anti-spam comprehensive grading filtering technique, there is provided a kind of filtering technique side of rule of combination Method, its object is to reduce the erroneous judgement of spam, strengthen anti-spam filter capacity.
The technical solution adopted by the present invention is:
A kind of rubbish mail filtering method based on rule of combination, including step:
A)Sub-rule is established;
B)Rule of combination is established;
C)Rule loading and storage;
D)Rule match.
Further, the step A)Including:
A1)With IP filtering techniques, reverse name resolution, SMTP FREQUENCY CONTROLs, user's black and white lists, keyword filtration, mail Information filtering, RBL, user behavior filtering are one or several kinds of to spam progress technology knowledge in rule scoring Not and generate respective rule;
A2)According to mail matter topics, mail head, sender address, ptr records, address of the addressee, sender's alias field, set Corresponding sub-rule.
Further, the step B)Including:
B1), pass through daily record and establish corresponding rule;
B2)Establish white list combination and blacklist rule of combination;
B3)Establish the rule of combination of settable score value.
Further, the step C)Pass through the frame mode loading and storage of multiway tree.
Further, the step D)For the matching flow of rule of combination, then further according to the storage organization of rule of combination From the beginning node starts to match, and is matched until traveling through each child node, ultimately generates the strictly all rules of anti-spam.
Refer to be checked using various technologies for comprehensive grading technology is popular, finally integrate various inspection results to recognize Determine whether mail is spam.Technically realize and refer to that all technology points of anti-spam application all generate corresponding rule after checking Then, some corresponding score value of each rule, the strictly all rules score value summation ultimately generated is if it exceeds some threshold value, is just assessed as this Mail is spam.The advantage of this architecture is exactly to reduce the erroneous judgement of spam, while can be ground with spread Hair and operation maintenance personnel only need to improve anti-spam rule base just.
Rule of combination refers to that integrating some sub-rules forms new rule, and these sub-rules are possible to derive from anti-spam sheet The inspection of body items technology point generates, and can also be generated by operation maintenance personnel dynamic configuration.The new regular matching condition can be by transporting Dimension personnel are set according to actual conditions dynamic, and the conditional expression of logic or, logical AND can be set between its support sub-rule. It can enter line discipline scoring to matching the new regular result, the treatment mechanisms such as black and white lists can be provided directly as, Regular generation and processing are very flexible in a word, can spread, by these rule settings, solve spam well The problem of failing to judge and judging by accident.
This method is one to comprehensive grading technical system very big supplement, at present entirely market belongs to first in the industry Wound, its advantage be for operation maintenance personnel provide one kind can Dynamic expansion, independent assortment, the Spam filtering flexibly set Treatment mechanism, for example, in daily operation, some new spams have successfully bypassed original filtering technique, in research and development people On the premise of member does not increase new technological means, what operation maintenance personnel can find these mails by magnanimity diary analysis all possesses which A little regular general character, then by the setting of rule of combination, it can be very good solve problems.
The present invention develops a kind of new filtering technique on the basis of existing technology, using the rubbish based on rule of combination Mail filtering method, the high filter capacity for improving spam of this method, reduce the possibility of the erroneous judgement of spam Property.
Brief description of the drawings
The concrete structure of the present invention is described in detail below in conjunction with the accompanying drawings
Fig. 1 is flow chart of the method for the present invention;
Fig. 2 is storage form structure chart of the present invention.
Embodiment
The present invention is described in further detail with specific embodiment below in conjunction with the accompanying drawings.
As shown in figure 1, what the present invention was realized in:A kind of rubbish mail filtering method based on rule of combination, mainly Step is as follows
A), sub-rule is established.
(1), with various filtering techniques, such as IP filtering techniques, reverse name resolution, SMTP FREQUENCY CONTROLs, user's black and white List, keyword filtration, Mail Contents filtering, RBL, the technology such as user behavior filtering, rule scoring are to spam The identification of carry out technology simultaneously generates respective rule, and these rules can directly set some rubbish score value, or subsequent combination The setting of rule provides basis, and the filtering technique of anti-spam filtration system is accompanied by the change in market and moment renewal, because And new filtering technique can all form corresponding sub-rule.
(2), O&M can also be according to mail matter topics, mail head, sender address, ptr records, address of the addressee, outbox The fields such as people's alias, corresponding sub-rule is set, for example sets rule, specific rules as one as follows:
[BASE_GMAIL_PTR]
PTR=\\.google\\.com
Note:The regular BASE_GMAIL_PTR can be matched if ptr records contain google.com.
In a word(1)(2)The rule of middle generation is all base rule, and the expansion of subsequent combination rule is all built upon these On the basis of rule.
B), rule of combination is established.
Several forms in form face specific as follows that rule of combination is established:
(1), pass through daily record and establish corresponding rule
The excavation of daily record is runed by magnanimity, the spam failed to judge is picked out and analyzes the sub-rule for possessing general character, O&M Personnel can refine these rules, and form new filtering rule, for example by log analysis, find some rubbish postals failed to judge Part, possess some features, the rule of this feature generation is a, b, c, and these are regular by d, therefore operation maintenance personnel can set composite class Rule is e, and rule configuration is as follows:
[e]
MetaEvalRule=”a and b and c and d”
Note:If mail generates a, b, c, this 4 rules of d are with regard to that can match the regular e.
(2), establish white list rule of combination
In daily life, some mails can be delivered directly, and need not walk Spam filtering flow, such as operation class postal Part, in order to ensure that such mail directly delivers success, a white list rule of combination can be set, is directly let pass after hit, such as The mail matter topics of such operation class mail include credit card keyword message, can set one and be similar to such a cuckoo Then, specific rules configuration is as follows:
[fn_subject]
SubjectKey=" credits card "
Note:If mail matter topics, which contain credit card, will be matched the regular fn_subject.
Certainly, only examine mail matter topics to be easy to judge by accident, the sub-rule of sender address condition can also be increased, Specific rules configuration is as follows:
[fn_from]
Sender=”llll\\@139\\.com”
Note:If sender address be llll@139 .com will be matched the regular fn_from.
So, according to above-mentioned two sub-rule, the rule of combination that such white list operation is delivered can be established, specifically Rule configuration is as follows:
[fn_white]
MetaEvalRule=” fn_subject and fn_from ”
ActionID=" 2 " # values be 2 for white list act
Note:Directly generate white list rule fn_ if as long as certain envelope mail meets regular fn_subject and fn_from White, matching the regular mail will directly let pass.
(3)Establish blacklist rule of combination
Blacklist rule of combination is substantially similar with white list rule of combination setting principle, it is only necessary to change lower working value, than Such as by log analysis, some spams failed to judge are found, the sub-rule for having common matching, a, b, c, if these are regular Possesses apparent spam feature, so we can be set such as the rule of combination of next blacklist, specific rules configuration It is as follows:
[fn_black]
MetaEvalRule=”a and b and c ”
ActionID=" 3 " # values be 3 for blacklist act
Note:If as long as certain envelope mail meets regular a, b, c just directly generate blacklist rule fn_black, match the rule Mail transmits direct refusal.
(4) rule of combination of settable score value, is established
In order to reduce the erroneous judgement ability of Spam filtering, system take comprehensive grading technology filtration system, in particular to Each rule can correspond to corresponding score value, if strictly all rules score value summation exceedes some threshold value it is determined that spam. System can not only have base rule and set score value, while we can be by the setting of rule of combination, to expand rule Score value system, if such as run into such a case, there is a kind of spam failed to judge all to generate corresponding regular a, b, And two rules of a, b can not directly regard as spam, can not directly be intercepted by the configuration of blacklist rule, it is this Situation, we can carry out bonus point processing to this kind of mail, and specific rules configuration is as follows:
[fn_jiafen]
MetaEvalRule=”a and b ”
The # of RuleScore=1.23 adds 1.23 points
Note:If as long as certain envelope mail meets that regular a, b just directly generate bonus point rule fn_jiafen, the regular postal is matched Part comprehensive scores are directly plus 1.23 offices are managed.
Certainly, this free score value sets system, can freely be expanded by operation maintenance personnel, equally appearance erroneous judgement feelings Under condition, deduction processing can also be carried out, the operation of deduction processing only needs regular score value being arranged to negative value.
C), rule loading and storage.
Rule of combination sets and is all disposed in configuration file, is automatically loaded into by program in internal memory and carries out order again Match somebody with somebody.Due to the matching of rule of combination, be required for based on sub-rule, sequentially match it is extremely important, once reversed order, will Cause rule match not on, thus rule be loaded into internal memory priority must have principle, have dependence combination advise Then, the rule that must ensure to rely on wants priority match.For such case, take multiway tree frame mode be carried in In depositing, the sub-rule of every dependence is all the brotgher of node of the regular father or father, for example has following so several Rule:
[a]
SubjectKey=" credits card " # will match sub-rule a, and mail matter topics will contain credit card field
The # of RuleScore=0.03 matching sub-rule a, add 0.03 point
[b]
Sender=" ll@139 .com " # to match sub-rule b, sender address must be ll@139.com
The # of RuleScore=0.02 matching sub-rule b, add 0.02 point
[c]
MetaEvalRule=" a and b " # will match rule of combination c, obtain while matched rule a and b
As above storage form of three rules in internal memory is as shown in Figure 2(A, b are c father's nodes).
D), rule match.
The matching flow of rule of combination is extremely complex, and the rule for possessing dependence has to preferentially be matched, otherwise Dependent Rule will unmatch, for example have such as C) described in three regular a, b, c, if not ensureing regular a and b during matching If preferentially being matched, then regular c would not be matched.The order thus matched to rule is the root node from tree Start to access, the sequential system for taking width to travel through is matched, until having traveled through each node.
The check algorithm of rule match is divided into two kinds, such as C) in sub-rule a, b, the side of our inspections to its condition Formula, mainly realized with canonical storehouse, and C) in regular c, system realizes a kind of matching algorithm of logical expression, the calculation Method can support and, or,!、()、&&、>、<Logical expression, operation maintenance personnel can be according to actual conditions arbitrary extension.
After rule of combination has matched, anti-spam filtration system can carry out black and white lists judgement according to the rule precedence of generation, If the rule of hit has black and white lists action, black and white lists flow is walked, otherwise according to the rule of generation, enters the plus-minus of line discipline Office is managed, and generates comprehensive grading score value, if comprehensive scores exceed some threshold value it is assumed that being spam, is otherwise walking mail just Often deliver flow.
In a word, the algorithm of matching has the calculation of the semantic interpretation of regular expression algorithm and the support logical expression of independent research Method, no matter thus rule of combination in condition coupling, or rule combination on, very flexibly, operation maintenance personnel can be according to reality Situation sets corresponding rule of combination, with the effect for reaching catching rubbish mail and reducing erroneous judgement.

Claims (5)

1. a kind of rubbish mail filtering method based on rule of combination, including step:
A)Sub-rule is established;
B)Rule of combination is established;
C)Rule loading and storage;
D)Rule match.
2. the rubbish mail filtering method based on rule of combination as claimed in claim 1, it is characterized in that:The step A)Bag Include:
A1)With IP filtering techniques, reverse name resolution, SMTP FREQUENCY CONTROLs, user's black and white lists, keyword filtration, mail Information filtering, RBL, user behavior filtering are one or several kinds of to spam progress technology knowledge in rule scoring Not and generate respective rule;
A2)According to mail matter topics, mail head, sender address, ptr records, address of the addressee, sender's alias field, set Corresponding sub-rule.
3. the rubbish mail filtering method based on rule of combination as claimed in claim 1, it is characterized in that:The step B)Bag Include:
B1), pass through daily record and establish corresponding rule;
B2)Establish white list combination and blacklist rule of combination;
B3)Establish the rule of combination of settable score value.
4. the rubbish mail filtering method based on rule of combination as claimed in claim 1, it is characterized in that:The step C)Pass through The frame mode loading and storage of multiway tree.
5. the rubbish mail filtering method based on rule of combination as claimed in claim 1, it is characterized in that:The step D)For group Matching flow normally, the order of rule of combination matching is accessed since the root node of tree, the order for taking width to travel through Mode is matched, until having traveled through each node, the strictly all rules of anti-spam is ultimately generated, wherein the algorithm matched has canonical The semantic interpretation algorithm of expression formula algorithm and the support logical expression of independent research.
CN201610821016.8A 2016-09-14 2016-09-14 Rubbish mail filtering method based on rule of combination Pending CN107566242A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610821016.8A CN107566242A (en) 2016-09-14 2016-09-14 Rubbish mail filtering method based on rule of combination

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610821016.8A CN107566242A (en) 2016-09-14 2016-09-14 Rubbish mail filtering method based on rule of combination

Publications (1)

Publication Number Publication Date
CN107566242A true CN107566242A (en) 2018-01-09

Family

ID=60973429

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610821016.8A Pending CN107566242A (en) 2016-09-14 2016-09-14 Rubbish mail filtering method based on rule of combination

Country Status (1)

Country Link
CN (1) CN107566242A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108055289A (en) * 2018-01-30 2018-05-18 深圳市富途网络科技有限公司 A kind of method and system audited to user-generated content based on internet
CN108710606A (en) * 2018-04-09 2018-10-26 平安科技(深圳)有限公司 A kind of Task Progress monitoring method, computer readable storage medium and terminal device
CN110048932A (en) * 2019-04-03 2019-07-23 北京奇安信科技有限公司 Validation checking method, apparatus, equipment and the storage medium of mail Monitoring function
CN110401591A (en) * 2019-07-22 2019-11-01 北京计算机技术及应用研究所 A kind of mail overall situation examination filtration system and method based on Transparent Proxy
CN111404805A (en) * 2020-03-12 2020-07-10 深信服科技股份有限公司 Junk mail detection method and device, electronic equipment and storage medium
CN117014228A (en) * 2023-09-27 2023-11-07 太平金融科技服务(上海)有限公司 Method, device, equipment and medium for determining mail content detection result

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050102366A1 (en) * 2003-11-07 2005-05-12 Kirsch Steven T. E-mail filter employing adaptive ruleset
CN1991879A (en) * 2005-12-29 2007-07-04 腾讯科技(深圳)有限公司 Filtration method of junk mail
CN101136874A (en) * 2007-07-25 2008-03-05 华南理工大学 Compound decision based anti-rubbish E-mail error filtering method and system
CN101447984A (en) * 2008-11-28 2009-06-03 电子科技大学 self-feedback junk information filtering method
CN103678349A (en) * 2012-09-10 2014-03-26 腾讯科技(深圳)有限公司 Method and device for filtering useless data
CN103684982A (en) * 2012-09-24 2014-03-26 中国电信股份有限公司 Spam mail filtering processing method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050102366A1 (en) * 2003-11-07 2005-05-12 Kirsch Steven T. E-mail filter employing adaptive ruleset
CN1991879A (en) * 2005-12-29 2007-07-04 腾讯科技(深圳)有限公司 Filtration method of junk mail
CN101136874A (en) * 2007-07-25 2008-03-05 华南理工大学 Compound decision based anti-rubbish E-mail error filtering method and system
CN101447984A (en) * 2008-11-28 2009-06-03 电子科技大学 self-feedback junk information filtering method
CN103678349A (en) * 2012-09-10 2014-03-26 腾讯科技(深圳)有限公司 Method and device for filtering useless data
CN103684982A (en) * 2012-09-24 2014-03-26 中国电信股份有限公司 Spam mail filtering processing method and system

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108055289A (en) * 2018-01-30 2018-05-18 深圳市富途网络科技有限公司 A kind of method and system audited to user-generated content based on internet
CN108710606A (en) * 2018-04-09 2018-10-26 平安科技(深圳)有限公司 A kind of Task Progress monitoring method, computer readable storage medium and terminal device
CN108710606B (en) * 2018-04-09 2021-10-26 平安科技(深圳)有限公司 Task progress monitoring method, computer readable storage medium and terminal equipment
CN110048932A (en) * 2019-04-03 2019-07-23 北京奇安信科技有限公司 Validation checking method, apparatus, equipment and the storage medium of mail Monitoring function
CN110401591A (en) * 2019-07-22 2019-11-01 北京计算机技术及应用研究所 A kind of mail overall situation examination filtration system and method based on Transparent Proxy
CN111404805A (en) * 2020-03-12 2020-07-10 深信服科技股份有限公司 Junk mail detection method and device, electronic equipment and storage medium
CN117014228A (en) * 2023-09-27 2023-11-07 太平金融科技服务(上海)有限公司 Method, device, equipment and medium for determining mail content detection result
CN117014228B (en) * 2023-09-27 2024-01-23 太平金融科技服务(上海)有限公司 Method, device, equipment and medium for determining mail content detection result

Similar Documents

Publication Publication Date Title
CN107566242A (en) Rubbish mail filtering method based on rule of combination
Toolan et al. Feature selection for spam and phishing detection
US7899866B1 (en) Using message features and sender identity for email spam filtering
CN103778186B (en) A kind of detection method of &#34; network waistcoat &#34;
US20140129655A1 (en) Signature generation using message summaries
CN108833417A (en) Mimicry mail server information processing unit and mail service processing method, device and mailing system
KR20010016276A (en) Method and system for processing e-mail with an anonymous receiver
WO2010035037A1 (en) Message processing
TW200949570A (en) Method for filtering e-mail and mail filtering system thereof
CN101330473A (en) Method and apparatus for filtrating network rubbish information supported by multiple protocols
CN105843851A (en) Analyzing and extracting method and device of cheating mails
CN107544961A (en) A kind of sentiment analysis method, equipment and its storage device of social media comment
CN106156105A (en) Email polymerization sorting technique and device
CN103345530B (en) A kind of social networks blacklist automatic fitration model based on semantic net
Iyengar et al. Integrated spam detection for multilingual emails
CN103595614A (en) User feedback based junk mail detection method
CN110048936B (en) Method for judging junk mail by semantic associated words
Reddy et al. Classification of Spam Messages using Random Forest Algorithm
CN107526823A (en) The verification analysis method and system of feelings information are disliked in logistics
CN110401591A (en) A kind of mail overall situation examination filtration system and method based on Transparent Proxy
Vahora et al. Novel approach: Naïve bayes with vector space model for spam classification
Wu et al. Research in anti-spam method based on bayesian filtering
CN1696943A (en) Self-adaptive method for filtering out garbage E-mails safely
Mathew et al. Analyzing the effectiveness of N-gram technique based feature set in a Naive Bayesian spam filter
JP2004254034A (en) System and method for controlling spam mail suppression policy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20181120

Address after: 518000 11, 41 floor, Guo Tong Building, 9023 Binhe Road, Futian District, Shenzhen, Guangdong.

Applicant after: Medium shift information technology Co., Ltd.

Applicant after: Polytron Technologies Inc

Address before: 510000 Guangzhou Tianhe District, Guangzhou City, Guangdong Province, No. 11 Pearl River West Road, Pearl River New City, Guangdong Global Building

Applicant before: China Mobile Communication Group Guangdong Co., Ltd.

Applicant before: Polytron Technologies Inc

WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180109