The content of the invention
The present invention is formed on anti-spam comprehensive grading filtering technique, there is provided a kind of filtering technique side of rule of combination
Method, its object is to reduce the erroneous judgement of spam, strengthen anti-spam filter capacity.
The technical solution adopted by the present invention is:
A kind of rubbish mail filtering method based on rule of combination, including step:
A)Sub-rule is established;
B)Rule of combination is established;
C)Rule loading and storage;
D)Rule match.
Further, the step A)Including:
A1)With IP filtering techniques, reverse name resolution, SMTP FREQUENCY CONTROLs, user's black and white lists, keyword filtration, mail
Information filtering, RBL, user behavior filtering are one or several kinds of to spam progress technology knowledge in rule scoring
Not and generate respective rule;
A2)According to mail matter topics, mail head, sender address, ptr records, address of the addressee, sender's alias field, set
Corresponding sub-rule.
Further, the step B)Including:
B1), pass through daily record and establish corresponding rule;
B2)Establish white list combination and blacklist rule of combination;
B3)Establish the rule of combination of settable score value.
Further, the step C)Pass through the frame mode loading and storage of multiway tree.
Further, the step D)For the matching flow of rule of combination, then further according to the storage organization of rule of combination
From the beginning node starts to match, and is matched until traveling through each child node, ultimately generates the strictly all rules of anti-spam.
Refer to be checked using various technologies for comprehensive grading technology is popular, finally integrate various inspection results to recognize
Determine whether mail is spam.Technically realize and refer to that all technology points of anti-spam application all generate corresponding rule after checking
Then, some corresponding score value of each rule, the strictly all rules score value summation ultimately generated is if it exceeds some threshold value, is just assessed as this
Mail is spam.The advantage of this architecture is exactly to reduce the erroneous judgement of spam, while can be ground with spread
Hair and operation maintenance personnel only need to improve anti-spam rule base just.
Rule of combination refers to that integrating some sub-rules forms new rule, and these sub-rules are possible to derive from anti-spam sheet
The inspection of body items technology point generates, and can also be generated by operation maintenance personnel dynamic configuration.The new regular matching condition can be by transporting
Dimension personnel are set according to actual conditions dynamic, and the conditional expression of logic or, logical AND can be set between its support sub-rule.
It can enter line discipline scoring to matching the new regular result, the treatment mechanisms such as black and white lists can be provided directly as,
Regular generation and processing are very flexible in a word, can spread, by these rule settings, solve spam well
The problem of failing to judge and judging by accident.
This method is one to comprehensive grading technical system very big supplement, at present entirely market belongs to first in the industry
Wound, its advantage be for operation maintenance personnel provide one kind can Dynamic expansion, independent assortment, the Spam filtering flexibly set
Treatment mechanism, for example, in daily operation, some new spams have successfully bypassed original filtering technique, in research and development people
On the premise of member does not increase new technological means, what operation maintenance personnel can find these mails by magnanimity diary analysis all possesses which
A little regular general character, then by the setting of rule of combination, it can be very good solve problems.
The present invention develops a kind of new filtering technique on the basis of existing technology, using the rubbish based on rule of combination
Mail filtering method, the high filter capacity for improving spam of this method, reduce the possibility of the erroneous judgement of spam
Property.
Embodiment
The present invention is described in further detail with specific embodiment below in conjunction with the accompanying drawings.
As shown in figure 1, what the present invention was realized in:A kind of rubbish mail filtering method based on rule of combination, mainly
Step is as follows
A), sub-rule is established.
(1), with various filtering techniques, such as IP filtering techniques, reverse name resolution, SMTP FREQUENCY CONTROLs, user's black and white
List, keyword filtration, Mail Contents filtering, RBL, the technology such as user behavior filtering, rule scoring are to spam
The identification of carry out technology simultaneously generates respective rule, and these rules can directly set some rubbish score value, or subsequent combination
The setting of rule provides basis, and the filtering technique of anti-spam filtration system is accompanied by the change in market and moment renewal, because
And new filtering technique can all form corresponding sub-rule.
(2), O&M can also be according to mail matter topics, mail head, sender address, ptr records, address of the addressee, outbox
The fields such as people's alias, corresponding sub-rule is set, for example sets rule, specific rules as one as follows:
[BASE_GMAIL_PTR]
PTR=\\.google\\.com
Note:The regular BASE_GMAIL_PTR can be matched if ptr records contain google.com.
In a word(1)(2)The rule of middle generation is all base rule, and the expansion of subsequent combination rule is all built upon these
On the basis of rule.
B), rule of combination is established.
Several forms in form face specific as follows that rule of combination is established:
(1), pass through daily record and establish corresponding rule
The excavation of daily record is runed by magnanimity, the spam failed to judge is picked out and analyzes the sub-rule for possessing general character, O&M
Personnel can refine these rules, and form new filtering rule, for example by log analysis, find some rubbish postals failed to judge
Part, possess some features, the rule of this feature generation is a, b, c, and these are regular by d, therefore operation maintenance personnel can set composite class
Rule is e, and rule configuration is as follows:
[e]
MetaEvalRule=”a and b and c and d”
Note:If mail generates a, b, c, this 4 rules of d are with regard to that can match the regular e.
(2), establish white list rule of combination
In daily life, some mails can be delivered directly, and need not walk Spam filtering flow, such as operation class postal
Part, in order to ensure that such mail directly delivers success, a white list rule of combination can be set, is directly let pass after hit, such as
The mail matter topics of such operation class mail include credit card keyword message, can set one and be similar to such a cuckoo
Then, specific rules configuration is as follows:
[fn_subject]
SubjectKey=" credits card "
Note:If mail matter topics, which contain credit card, will be matched the regular fn_subject.
Certainly, only examine mail matter topics to be easy to judge by accident, the sub-rule of sender address condition can also be increased,
Specific rules configuration is as follows:
[fn_from]
Sender=”llll\\@139\\.com”
Note:If sender address be llll@139 .com will be matched the regular fn_from.
So, according to above-mentioned two sub-rule, the rule of combination that such white list operation is delivered can be established, specifically
Rule configuration is as follows:
[fn_white]
MetaEvalRule=” fn_subject and fn_from ”
ActionID=" 2 " # values be 2 for white list act
Note:Directly generate white list rule fn_ if as long as certain envelope mail meets regular fn_subject and fn_from
White, matching the regular mail will directly let pass.
(3)Establish blacklist rule of combination
Blacklist rule of combination is substantially similar with white list rule of combination setting principle, it is only necessary to change lower working value, than
Such as by log analysis, some spams failed to judge are found, the sub-rule for having common matching, a, b, c, if these are regular
Possesses apparent spam feature, so we can be set such as the rule of combination of next blacklist, specific rules configuration
It is as follows:
[fn_black]
MetaEvalRule=”a and b and c ”
ActionID=" 3 " # values be 3 for blacklist act
Note:If as long as certain envelope mail meets regular a, b, c just directly generate blacklist rule fn_black, match the rule
Mail transmits direct refusal.
(4) rule of combination of settable score value, is established
In order to reduce the erroneous judgement ability of Spam filtering, system take comprehensive grading technology filtration system, in particular to
Each rule can correspond to corresponding score value, if strictly all rules score value summation exceedes some threshold value it is determined that spam.
System can not only have base rule and set score value, while we can be by the setting of rule of combination, to expand rule
Score value system, if such as run into such a case, there is a kind of spam failed to judge all to generate corresponding regular a, b,
And two rules of a, b can not directly regard as spam, can not directly be intercepted by the configuration of blacklist rule, it is this
Situation, we can carry out bonus point processing to this kind of mail, and specific rules configuration is as follows:
[fn_jiafen]
MetaEvalRule=”a and b ”
The # of RuleScore=1.23 adds 1.23 points
Note:If as long as certain envelope mail meets that regular a, b just directly generate bonus point rule fn_jiafen, the regular postal is matched
Part comprehensive scores are directly plus 1.23 offices are managed.
Certainly, this free score value sets system, can freely be expanded by operation maintenance personnel, equally appearance erroneous judgement feelings
Under condition, deduction processing can also be carried out, the operation of deduction processing only needs regular score value being arranged to negative value.
C), rule loading and storage.
Rule of combination sets and is all disposed in configuration file, is automatically loaded into by program in internal memory and carries out order again
Match somebody with somebody.Due to the matching of rule of combination, be required for based on sub-rule, sequentially match it is extremely important, once reversed order, will
Cause rule match not on, thus rule be loaded into internal memory priority must have principle, have dependence combination advise
Then, the rule that must ensure to rely on wants priority match.For such case, take multiway tree frame mode be carried in
In depositing, the sub-rule of every dependence is all the brotgher of node of the regular father or father, for example has following so several
Rule:
[a]
SubjectKey=" credits card " # will match sub-rule a, and mail matter topics will contain credit card field
The # of RuleScore=0.03 matching sub-rule a, add 0.03 point
[b]
Sender=" ll@139 .com " # to match sub-rule b, sender address must be ll@139.com
The # of RuleScore=0.02 matching sub-rule b, add 0.02 point
[c]
MetaEvalRule=" a and b " # will match rule of combination c, obtain while matched rule a and b
As above storage form of three rules in internal memory is as shown in Figure 2(A, b are c father's nodes).
D), rule match.
The matching flow of rule of combination is extremely complex, and the rule for possessing dependence has to preferentially be matched, otherwise
Dependent Rule will unmatch, for example have such as C) described in three regular a, b, c, if not ensureing regular a and b during matching
If preferentially being matched, then regular c would not be matched.The order thus matched to rule is the root node from tree
Start to access, the sequential system for taking width to travel through is matched, until having traveled through each node.
The check algorithm of rule match is divided into two kinds, such as C) in sub-rule a, b, the side of our inspections to its condition
Formula, mainly realized with canonical storehouse, and C) in regular c, system realizes a kind of matching algorithm of logical expression, the calculation
Method can support and, or,!、()、&&、>、<Logical expression, operation maintenance personnel can be according to actual conditions arbitrary extension.
After rule of combination has matched, anti-spam filtration system can carry out black and white lists judgement according to the rule precedence of generation,
If the rule of hit has black and white lists action, black and white lists flow is walked, otherwise according to the rule of generation, enters the plus-minus of line discipline
Office is managed, and generates comprehensive grading score value, if comprehensive scores exceed some threshold value it is assumed that being spam, is otherwise walking mail just
Often deliver flow.
In a word, the algorithm of matching has the calculation of the semantic interpretation of regular expression algorithm and the support logical expression of independent research
Method, no matter thus rule of combination in condition coupling, or rule combination on, very flexibly, operation maintenance personnel can be according to reality
Situation sets corresponding rule of combination, with the effect for reaching catching rubbish mail and reducing erroneous judgement.