CN100589453C - Processing device and method for anti-junk mails - Google Patents

Processing device and method for anti-junk mails Download PDF

Info

Publication number
CN100589453C
CN100589453C CN200610001083A CN200610001083A CN100589453C CN 100589453 C CN100589453 C CN 100589453C CN 200610001083 A CN200610001083 A CN 200610001083A CN 200610001083 A CN200610001083 A CN 200610001083A CN 100589453 C CN100589453 C CN 100589453C
Authority
CN
China
Prior art keywords
mail
spam
template
send
legitimate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN200610001083A
Other languages
Chinese (zh)
Other versions
CN101005462A (en
Inventor
王晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN200610001083A priority Critical patent/CN100589453C/en
Publication of CN101005462A publication Critical patent/CN101005462A/en
Application granted granted Critical
Publication of CN100589453C publication Critical patent/CN100589453C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The apparatus comprises: mail receiving/delivering unit; general mail control unit; and anti-spam unit and mail database. The mail database pre-stores the template of spam and the template of legal mail; after receiving the request of receiving and delivering mail, the mail receiving/delivering unit sends the mail to be received or delivered to the anti-spam unit that compares said mail with the template of spam and the template of legal template, and according to the comparison result, marks the mail with the type label, and sent the mail to the general mail control unit to make control process, or returns the mail to the mail receiving/delivering unit.

Description

A kind of processing unit of anti-rubbish mail and method
Technical field
The present invention relates to Email (E-mail) treatment technology, refer to a kind of processing unit and method of anti-rubbish mail especially.
Background technology
At present, spam is walked crosswise on the internet and is wreaked havoc, and causes very big puzzlement for many network users.In order to address the above problem, the Internet Service Provider is studying the method for anti-rubbish mail, so that the spam better on the filtering network.
The method of existing anti-rubbish mail mainly contains following several: 1) rule-based key character cascade filter spam; 2) adopt statistic algorithm Classification and Identification spam, such as the Bayesian algorithm etc.; 3) sender's black and white lists etc. is set and is used for the catching rubbish mail; 4) the transmission behavior is carried out control methods such as flow restriction.The processing method of above-mentioned anti-rubbish mail can separately or be used in combination, and is suitable for being applied on the mail treatment device of mail server and/or personal user's end.Fig. 1 comprises mail reception/delivery unit 11 and general mail control unit 12 for the composition structure of existing mail treatment device.
Described general mail control unit 12 is used for mail is carried out coding/decoding and storage, and shows that Mail Contents is to the user etc.This control unit comprises again: legitimate mail processing module and spam processing module.The legitimate mail processing module is responsible for the mail that carries the legitimate mail sign is carried out subsequent treatment, such as this mail is stored in inbox, perhaps sends etc.; The spam processing module is responsible for the mail that carries the spam sign is carried out subsequent treatment, such as this mail is placed on dustbin, does not send or limit this mail of transmission etc.
Described mail reception/delivery unit 11 receives the instruction of general mail control unit 12, and according to existing mail transfer protocol, reception/transmission new mail between mail server and/or personal user's end.
Find that by practical application the method for above-mentioned anti-rubbish mail is just tackled spam passively, the spammer is easy to adopt corresponding countermeasure, and these anti-spam technologies are disturbed.
Such as, at message digest algorithm verification class methods such as (MD5, Message-Digest algorithm 5), the spammer introduces the content of a small amount of change at random when sending spam, make that non-strictness is identical between the spam.
At statistics class methods such as Bayesian, the spammer inserts a large amount of word/characters of upsetting in mail, upset the feature ratio that is used for rubbish identification with this.
At technology such as black and white lists, transmission address filterings, spammer's conversion sender/transmission address is so that send identical mail by different server ips.
At the linguistic relativity method, the spammer changes language, alphabetic character collection of message body etc., makes technology such as Chinese word segmenting can't leach spam effectively.
At the method for flow control, the spammer slows down the frequency that self sends spam, and perhaps the number of mail that same account number is sent is controlled, and makes spam to be identified.
Find out from foregoing description, nearly all anti-spam technologies commonly used is all cracked by the spammer, be the transmission technology that the renewal frequency of anti-rubbish technology lags behind spam far away, this makes network that gratifying anti-rubbish effect can't be provided all the time.
Summary of the invention
In view of this, main purpose of the present invention is to provide a kind of processing unit of anti-rubbish mail, makes the mail treatment device when reception/transmission mail, can and filter spam based on the identification of mail template, thereby reach preferable anti-rubbish effect.
Another purpose of the present invention is to provide a kind of processing method of anti-rubbish mail, utilizes the identification of mail template and filters spam, and further generate and adjust the mail template by initiatively extracting mail features.
For achieving the above object, technical scheme of the present invention specifically is achieved in that
A kind of processing unit of anti-rubbish mail comprises mail reception/delivery unit and general mail control unit; This mail treatment device also comprises: anti-trash processing unit and mail database;
Described mail database comprises legitimate mail tabulation storehouse, spam ATL and all mail sample storehouses, and described legitimate mail tabulation storehouse is used to preserve the legitimate mail template, and described spam ATL is used to preserve the spam template;
Mail reception/delivery unit gets the mail after the reception/transmission request, and the mail that receives waiting/send is delivered to anti-trash processing unit, and is kept in all mail samples storehouse;
Anti-trash processing unit obtains the legitimate mail template and waits that the mail that receives/send compares from legitimate mail tabulation storehouse, if there is the similar mail wait the mail that receives/sends in the legitimate mail template, the mail that then receives for waiting/send is stamped legitimate mail and is identified;
Otherwise, obtain the spam template and wait that the mail that receives/sends compares from the spam ATL, if there is the similar mail wait the mail that receives/send in the spam template, the mail that then receives for waiting/send is stamped spam and is identified;
Otherwise, in all mail sample storehouses, search the similar mail of waiting the mail that receives/send, when surpassing predetermined value, similar number of mail extracts the mail template of waiting the mail that receives/send according to the Alignment comparison algorithm, and according to the information that sets in advance is that the described mail that receives/send of waiting is stamped spam sign or legitimate mail sign, and the mail template of the mail that receives/send according to waiting of will extracting of type identification is saved in the mail database of correspondence;
Anti-trash processing unit will be stamped the mail that receives/send of waiting of type identification and deliver to general mail control unit and carry out processing controls, or return to mail reception/delivery unit.
Described anti-trash processing unit comprises: rubbish determination module and mail template extraction module;
After the rubbish determination module receives the described mail of waiting to receive/send, issue query requests to legitimate mail tabulation storehouse and spam ATL respectively, with the email type of definite described mail, and when surpassing predetermined value, the similar number of mail of the mail of waiting to receive/send sends the template extraction instruction to mail template extraction module;
After mail template extraction module receives above-mentioned instruction, utilize the Alignment comparison algorithm to extract the mail template, be saved in the mail database of corresponding types.
This mail treatment device also comprises: all mail storehouses maintenance module, mail tabulation storehouse maintenance module and rubbish ATL maintenance module;
Then the operation that all mail samples storehouse, legitimate mail tabulation storehouse and spam ATL are carried out is undertaken by all mails storehouse maintenance module, mail tabulation storehouse maintenance module and rubbish ATL maintenance module respectively.
This mail treatment device also comprises: the error feedback processing module is used for sending the request of modification according to field feedback to anti-trash processing unit; After anti-trash processing unit receives above-mentioned request, send amendment advice, revise the corresponding data record in legitimate mail tabulation storehouse and/or the spam ATL to mail tabulation storehouse maintenance module and/or rubbish ATL maintenance module;
Perhaps, the error feedback processing module is directly sent amendment advice to mail tabulation storehouse maintenance module and/or rubbish ATL maintenance module, revises the corresponding data record in legitimate mail tabulation storehouse and/or the spam ATL.
Described mail treatment device is arranged on personal user's end, then general mail control unit is asked according to the user, send amendment advice to mail tabulation storehouse maintenance module and/or rubbish ATL maintenance module, revise the corresponding data record in legitimate mail tabulation storehouse and/or the spam ATL.
When mail reception/delivery unit received mail, described anti-trash processing unit carried out type decision to this mail, and added the type identification of going up correspondence and deliver to general mail control unit;
General mail control unit is kept at correspondence position with this mail and is shown to the user according to described type identification, if for legitimate mail sign then be kept at inbox, if identify for spam then be kept at dustbin.
Described mail treatment device is arranged on mail server.
A kind of processing method of anti-rubbish mail, be applied to comprise in the mail treatment device of mail reception/delivery unit, general mail control unit, anti-trash processing unit and mail database, described mail database comprises the legitimate mail tabulation storehouse of preserving the legitimate mail template, the spam ATL of preserving the spam template and all mail samples storehouse of preserving undressed mail sample, and this method may further comprise the steps:
A, mail reception/delivery unit are when reception/transmission mail, and the mail that receives waiting/send is delivered to anti-trash processing unit and is kept in all mail samples storehouse;
B, anti-trash processing unit according to comparison strategy with legitimate mail tabulate in the storehouse the legitimate mail template with treat that reception/transmission mail compares one by one, and judge whether to exist the similar mail of waiting to receive/send mail, if exist then stamp legitimate mail and identify for the described receptions/transmission mail for the treatment of, otherwise execution in step c;
C, anti-trash processing unit according to comparison strategy with the spam template in the spam ATL with treat that reception/transmission mail compares one by one, and judge whether to exist the similar mail of waiting to receive/send mail, if exist then stamp spam and identify for the described receptions/transmission mail for the treatment of, otherwise execution in step d;
D, anti-trash processing unit are searched the described similar mail of preserving in all mail samples storehouse of waiting to receive/send mail, and judge whether similar number of mail surpasses predetermined value, if surpass then extract the new mail template of identical content generation of waiting to receive/sends mail and its similar mail according to the Alignment comparison algorithm, and determine the described type of waiting to receive/send mail according to the information of setting in advance, and stamp corresponding type identification, according to the email type of determining, described mail template is saved in the corresponding mail database;
E, anti-trash processing unit will be stamped the reception/transmission mail for the treatment of of type identification and deliver to general mail control unit and carry out processing controls, or return to mail reception/delivery unit.
Among step b or the c, described anti-trash processing unit obtains comparative result according to the Alignment comparison algorithm.
In the steps d, anti-trash processing unit is determined email type according to black and white lists.
Described comparison strategy comprises: compare Mail Contents, or compare mail format, or relatively send the communication instruction sequence of mail, or adopt the combination in any of above-mentioned three kinds of modes.
Step b or step c be described to judge whether to exist the method for the similar mail of waiting to receive/send mail to be specially: set in advance similarity threshold, and judge whether the described comparative result that compares one by one surpasses corresponding similarity threshold, if surpass then receive/send the similar mail of mail for waiting, otherwise be not.
As seen from the above technical solution, the processing unit of this anti-rubbish mail of the present invention, in existing mail treatment device, increase anti-trash processing unit, the mail that receives waiting/send compares with the template that is kept in the mail database in advance, according to existing data separation spam and legitimate mail, and behind definite email type, extract described mail features, be kept in the mail database, as using the new mail template of making comparisons in the subsequent process, making mail database to access dynamically updates, thereby improve the reliability of mail treatment device identification spam, strengthen the flexibility of this device anti-rubbish mail, reach preferable anti-rubbish effect.Wherein, the mail treatment device can refer to the mailer of setting in mail server or the personal user's end.
In addition, the processing method of anti-rubbish mail of the present invention, transmission behavior by extracting a large amount of mails, send feature such as content and structure, generate spam template and legitimate mail template respectively, and the dynamic adjustment that keeps above-mentioned mail template, utilize the Alignment comparison algorithm will wait that the mail and the above-mentioned mail template that receive/send compare then, by judging the similarity of described mail and mail template, distinguish and filter out spam, so this method has the feasibility of certain degree, can improve the accuracy of spam identification.
Based on above-mentioned anti-rubbish technology, the present invention is catching rubbish information better, and the proper communication of guaranteeing the mailing system user is not subjected to the interference of spam.Further, mailing system can utilize said method that spam early warning, processing policy are set, such as providing varigrained anti-rubbish service etc. to VIP charge user and common free user.
Description of drawings
Fig. 1 is the composition structure of mail treatment device in the prior art;
Fig. 2 carries out character string schematic diagram relatively for utilizing the Alignment comparison algorithm in a preferred embodiment of the present invention;
Fig. 3 is the composition structure of anti-rubbish mail processing unit among the present invention;
Fig. 4 is the idiographic flow of anti-garbage disposal among the present invention.
Embodiment
For making purpose of the present invention, technical scheme and advantage clearer, below with reference to the accompanying drawing embodiment that develops simultaneously, the present invention is described in more detail.
Anti-refuse disposal installation of the present invention and method can initiatively be sought the regularity of mail at aspects such as transmission behavior, transmission content and structures, and in mailing system, introduce the Alignment technology and extract spam template and/or legitimate mail template, as the standard of differentiating spam.
Described Alignment technology is a kind of method that is applied in the biological information field, is widely used in searching the identical characters string in the dna sequence dna, to disclose problems such as whether occurring certain section sequence in the DNA database in a large number.Wherein, the character in the dna sequence dna refers to biology characters such as ATCG, and other characters such as space, line feed, additional character, non-English letter all can be left in the basket in the biological information field.When utilizing the Alignment technology to compare two input of character string, allow that input of character string is carried out deletion/insertion and wait operation, so that, so just can seek out the maximum possible matched character string of the two with going relatively again after two character strings alignment.
In order to make above-mentioned Alignment technology can be applicable to mailing system, among the present invention the process object of Alignment technology is expanded, make it to cover whole ascii characters, spcial character (such as carriage return, line feed and blank character), and Chinese character (character 127~256) etc., thereby make the Alignment technology possess the ability of handling any Chinese and English character string.What Fig. 2 showed is that the Alignment technology is carried out character string schematic diagram relatively, and two row are the input of character string that need contrast up and down, and the space is inserted in '-' expression; Middle row is a comparative result, and wherein ' | ' expression conforms to, '. ' represent not conform to.After obtaining comparative result, can further judge according to the standard of setting whether these two input of character string are similar, such as the character string that has 25 characters,, judge that then these two input of character string are similar if ' | ' in the comparative result surpasses 13.
As can be seen from Figure 2, the Alignment technology can show two identical contents in the character string, so under the E-mail environment, can adopt concrete Alignment comparison algorithms such as Fasta, Blast, extract the spam template and/or the legitimate mail template that are hidden in a large amount of mails, such as the XML text of certain webpage and structure, or the article title template etc., make anti-rubbish technology have stronger specific aim, improve the interception accuracy rate of spam.Still be example with Fig. 2, by these two input of character string relatively after, get its identical content and just can generate a new template.
For obtaining spam template and/or legitimate mail template, mail database need be set be used to preserve a large amount of E-mail object as a comparison, and form a large amount of comparative results as a reference by certain hour.At the realization initial stage of this method, mail databases such as set all mail samples storehouse, legitimate mail tabulation storehouse and spam ATL all are empty.After this, the mail treatment device is reception/transmission mail in a single day, just this mail is kept in all mail samples storehouse as sample.
When the mail sample in all mail samples storehouse reaches some, the mail treatment device is started working, adopt the Alignment comparison algorithm that the mail sample in the follow-up new mail that receives and all mail sample storehouses is compared, and judge the type of this mail according to similarity degree.In order to raise the efficiency, can only select the mail sample close to compare with this mail size.
What Fig. 3 showed is the composition structure of anti-rubbish mail processing unit among the present invention, and this mail treatment device comprises: mail reception/delivery unit 31, general mail control unit 32, anti-trash processing unit 33, all mail sample storehouses 34, legitimate mail tabulation storehouse 35 and spam ATL 36.
The operation principle of mail reception/delivery unit 31 and general mail control unit 32 is same as the prior art, repeats no more herein.
Described anti-trash processing unit 33 forks comprise: rubbish determination module 331 and mail template extraction module 332.
What preserve in described all mail samples storehouse 34 is undressed mail sample; Every data record of preserving in the legitimate mail tabulation storehouse 35 is a legitimate mail template, and each legitimate mail template all is to be generated by the identical content that extracts wherein after relatively by the above legitimate mail of an envelope; What preserve in the spam ATL 36 is the spam template, and the generating mode of this template and legitimate mail template class seemingly repeat no more herein.
For fear of mail databases such as all mail samples storehouse 34, legitimate mail tabulation storehouse 35 and spam ATL 36 are carried out direct control, can be respectively for each database is provided with corresponding maintenance module: all mail storehouses maintenance module 341, mail tabulation storehouse maintenance module 351 and rubbish ATL maintenance module 361.Like this, all can assign via the maintenance module of correspondence the operational order of each database, described operation comprises the data record of obtaining in the mail database, perhaps to wherein data record interpolation, deletion and modification etc.
Mail reception/delivery unit 31 gets the mail after reception/transmission request, to treat that reception/transmission mail delivers to rubbish determination module 331 and handle, by all mails storehouse maintenance module 341 the described reception/transmission mail for the treatment of will be kept in all mail samples storehouse 34 simultaneously.
After rubbish determination module 331 receives above-mentioned new mail, by all mails storehouse maintenance module 341, mail tabulation storehouse maintenance module 351 and rubbish ATL maintenance module 361, send query requests to all mail samples storehouse 34, legitimate mail tabulation storehouse 35 and spam ATL 36 respectively, and judge the email type of described new mail according to the data record in the above-mentioned database.Then, rubbish determination module 331 is stamped legal/spam sign for this mail, and it is delivered to general mail control unit 32 carry out processing controls, or returns to mail reception/delivery unit 31.
In addition, mail template extraction module 332 can be according to the judged result of rubbish determination module 331, the mail template that extraction makes new advances is sent into legitimate mail tabulation storehouse 35 or spam ATL 36, if described new mail is a legitimate mail, then obtain new legitimate mail template, situation for spam is as the same, so anti-rubbish mail device of the present invention can dynamically generate and adjust the mail template of self, guarantees that its anti-rubbish technology that provides is difficult for being cracked.
The feedback information that error feedback processing module 37 is responsible for according to user/keeper is judged indication etc. by accident such as mail, adjusts the record of associated databases.Wherein, error feedback processing module 37 can be delivered to rubbish determination module 331 with adjusting record request, is finished the record adjustment of associated databases again by rubbish determination module 331; Perhaps, error feedback processing module 37 directly will be adjusted record request and be sent to mail tabulation storehouse maintenance module 351 and/or rubbish ATL maintenance module 361, to finish the record adjustment of associated databases.
In addition, if mail treatment device of the present invention is arranged on personal user's end, described general mail control unit 32 also can directly be asked according to the user, send the adjustment record request to mail tabulation storehouse maintenance module 351 and/or rubbish ATL maintenance module 361, the record of associated databases is operated.
Based on above-mentioned anti-rubbish mail processing unit, the handling process of the anti-rubbish technology of the present invention is seen Fig. 4, specifically may further comprise the steps:
After step 401, mail treatment device produce the new mail of waiting to receive/send, described mail is kept in all mail samples storehouse, and this mail is delivered to anti-trash processing unit carry out type decision.
After rubbish determination module in step 402, the anti-trash processing unit receives above-mentioned mail, maintenance module sends query requests to the mail tabulation storehouse, data record by legitimate mail tabulation storehouse judges whether described mail is legitimate mail, if execution in step 409 then, otherwise execution in step 403.
Every data record of preserving in the described legitimate mail tabulation storehouse all is a legitimate mail template, and these legitimate mail templates are after by the Alignment comparison algorithm legitimate mail being compared, and extracts that identical content in the mail generates.The rubbish determination module compares the described original contents and the legitimate mail template that receive/send mail waited one by one according to the Alignment comparison algorithm.Wherein, wait to receive/original contents that sends mail shows as alphabetical text formatting, so the Alignment comparison algorithm can be handled it as character string.
When specifically comparing, can comparison strategy be set at the characteristics of E-mail, such as: the similarity that compares E-mail content (comprising the mail head); The similarity that compares the E-mail form; Perhaps, when the transmission behavior of analyzing spam, relatively send the similarity etc. of the communication instruction sequence of E-mail.
If compare the E-mail form, just it is converted into layout sequence in proper order according to appearance, again layout sequence is compared in twos as character string.Generally speaking, only these three characters of TAB (t), carriage return (r), line feed (n) are considered as layout character, so possible layout sequence be " t r n r n r n ".In fact, the layout character that constitutes the E-mail form can be provided with as required, such as punctuation mark also is set to layout character etc.Similarly, Email content and/or communication instruction sequence also can adopt similar method to draw comparative result.
For different comparison contents, mailing system can be set different similarity thresholds, is 95% such as mail head's similarity threshold, and the similarity threshold of message body is 80%, and the similarity threshold of annexes such as picture is 98% etc.Have only comparative result to surpass the similarity threshold of setting, could judge that the two is similar.Above-mentioned similarity threshold can be adjusted and revise according to user's feedback, repeats no more herein.
In addition, the comparative result of different piece can also be calculated comprehensive similar index according to specified rule, as the standard of the legal/spam of identification.Such as, if mail head's similarity>95% of two input mails, message body similarity>80%, and all include identical picture annex, judge that then these two input mails are similar.
Through said process, the rubbish determination module can know at an easy rate whether this mail is legal.
Step 403, rubbish determination module send query requests to rubbish ATL maintenance module, judge according to the data record of spam ATL whether described mail is spam, if execution in step 404 then, otherwise execution in step 405.
In this step, the deterministic process and the legitimate mail of spam are similar, repeat no more herein.
Step 404, rubbish determination module are designated as described mail spam, provide high-risk rubbish index, and it is delivered to general mail control unit, and execution in step 410 then.
Described high-risk rubbish index comprises: the covering mail number of spam similarity, spam template etc.If covering the mail number is 10,000 envelopes, show that this spam template extracts according to 10,000 envelope spams.
Step 405, rubbish determination module send query requests to all mails storehouse maintenance module, in the mail sample that preserve in all mail sample storehouses, search the similar mail of described mail, if described similar number of mail surpasses predetermined threshold value T then execution in step 406, otherwise direct execution in step 407.
Step 406, rubbish determination module send the template extraction instruction to mail template extraction module.After mail template extraction module receives above-mentioned instruction, from all mail samples storehouse, choose at least one and wait to receive/send the similar mail sample of mail to described, and with the Alignment comparison algorithm this mail and mail sample are compared, generate the new mail template.
Step 407, rubbish determination module are judged described email type according to the supplementary that self is provided with, if for spam then circular mail template extraction module is saved in the new mail template in the spam ATL and execution in step 408; If be legitimate mail, then the new mail template is saved in the legitimate mail tabulation storehouse, and execution in step 409.
Described supplementary refers to black and white lists etc., is delivered by address trusty such as certain mail, just it is listed on the white list, and the rubbish determination module receives the mail identical with this transmission address in subsequent process, will determine that it is legitimate mail.Spam information also can be adopted the setting that uses the same method, and repeats no more herein.
Step 408, rubbish determination module are designated as described mail spam, provide medium rubbish index, and it is delivered to general mail control unit, and execution in step 410 then.
Step 409, rubbish determination module are designated as described mail legitimate mail, provide the legitimate mail index, and it is delivered to general mail control unit.
Step 410, general mail control unit are carried out subsequent treatment according to the type identification of described mail to it, repeat no more herein.
For the mail that carries the spam sign, the mail treatment device will be refused to send or the transmission of limiting the quantity of, and normally send for the then maintenance that is judged to be legitimate mail.
The processing procedure of above-mentioned anti-rubbish mail can trigger when reception/transmission mail, thus the anti-garbage disposal dynamics when strengthening mail transmission/reception.For receiving and send the mail both of these case, the mail treatment device can be treated with a certain discrimination when carrying out type decision.Such as, for the mail that receives, need all mail samples of inquiry storehouse; Then needn't carry out said process when sending mail.Again such as, when the mail treatment device is arranged on mail server, receives mail and use larger mail database to compare, and send the mail template etc. that mail only needs smaller scope.
In addition, mail server and personal user's end can be provided with above-mentioned mail treatment device simultaneously, make whole mailing system have stronger anti-rubbish ability.When the personal user holds when mail server sends an envelope mail, understand data record according to self mail database, carry out a mail scanning to judge the type of this mail; Mail server can be carried out type decision one time again after receiving this envelope mail, and according to circumstances extracts new mail template.In the practical application, the anti-rubbish mail processing procedure of mailing system is not limited thereto, and can trigger the mail scanning process on required any mail treatment device, repeats no more herein.
By the above embodiments as seen, the processing unit of this anti-rubbish mail of the present invention and method, the mail that receives waiting/send compares with the template that is kept in the mail database in advance, according to existing data separation spam and legitimate mail, and behind definite email type, extract described mail features, as the new mail template in the subsequent process, thereby reach preferable anti-rubbish effect.

Claims (12)

1, a kind of processing unit of anti-rubbish mail comprises mail reception/delivery unit and general mail control unit; It is characterized in that this mail treatment device also comprises: anti-trash processing unit and mail database;
Described mail database comprises legitimate mail tabulation storehouse, spam ATL and all mail sample storehouses, and described legitimate mail tabulation storehouse is used to preserve the legitimate mail template, and described spam ATL is used to preserve the spam template;
Mail reception/delivery unit gets the mail after the reception/transmission request, and the mail that receives waiting/send is delivered to anti-trash processing unit, and is kept in all mail samples storehouse;
Anti-trash processing unit obtains the legitimate mail template and waits that the mail that receives/send compares from legitimate mail tabulation storehouse, if there is the similar mail wait the mail that receives/sends in the legitimate mail template, the mail that then receives for waiting/send is stamped legitimate mail and is identified;
Otherwise, obtain the spam template and wait that the mail that receives/sends compares from the spam ATL, if there is the similar mail wait the mail that receives/send in the spam template, the mail that then receives for waiting/send is stamped spam and is identified;
Otherwise, in all mail sample storehouses, search the similar mail of waiting the mail that receives/send, when surpassing predetermined value, similar number of mail extracts the mail template of waiting the mail that receives/send according to the Alignment comparison algorithm, and according to the information that sets in advance is that the described mail that receives/send of waiting is stamped spam sign or legitimate mail sign, and the mail template of the mail that receives/send according to waiting of will extracting of type identification is saved in the mail database of correspondence;
Anti-trash processing unit will be stamped the mail that receives/send of waiting of type identification and deliver to general mail control unit and carry out processing controls, or return to mail reception/delivery unit.
2, device according to claim 1 is characterized in that, described anti-trash processing unit comprises: rubbish determination module and mail template extraction module;
After the rubbish determination module receives the described mail of waiting to receive/send, issue query requests to legitimate mail tabulation storehouse and spam ATL respectively, with the email type of definite described mail, and when surpassing predetermined value, the similar number of mail of the mail of waiting to receive/send sends the template extraction instruction to mail template extraction module;
After mail template extraction module receives above-mentioned instruction, utilize the Alignment comparison algorithm to extract the mail template, be saved in the mail database of corresponding types.
3, device according to claim 1 and 2 is characterized in that, this mail treatment device also comprises: all mail storehouses maintenance module, mail tabulation storehouse maintenance module and rubbish ATL maintenance module;
Then the operation that all mail samples storehouse, legitimate mail tabulation storehouse and spam ATL are carried out is undertaken by all mails storehouse maintenance module, mail tabulation storehouse maintenance module and rubbish ATL maintenance module respectively.
4, device according to claim 3 is characterized in that, this mail treatment device also comprises: the error feedback processing module is used for sending the request of modification according to field feedback to anti-trash processing unit; After anti-trash processing unit receives above-mentioned request, send amendment advice, revise the corresponding data record in legitimate mail tabulation storehouse and/or the spam ATL to mail tabulation storehouse maintenance module and/or rubbish ATL maintenance module;
Perhaps, the error feedback processing module is directly sent amendment advice to mail tabulation storehouse maintenance module and/or rubbish ATL maintenance module, revises the corresponding data record in legitimate mail tabulation storehouse and/or the spam ATL.
5, device according to claim 3, it is characterized in that, described mail treatment device is arranged on personal user's end, then general mail control unit is asked according to the user, send amendment advice to mail tabulation storehouse maintenance module and/or rubbish ATL maintenance module, revise the corresponding data record in legitimate mail tabulation storehouse and/or the spam ATL.
6, device according to claim 5 is characterized in that, when mail reception/delivery unit received mail, described anti-trash processing unit carried out type decision to this mail, and adds the type identification of going up correspondence and deliver to general mail control unit;
General mail control unit is kept at correspondence position with this mail and is shown to the user according to described type identification, if for legitimate mail sign then be kept at inbox, if identify for spam then be kept at dustbin.
7, device according to claim 3 is characterized in that, described mail treatment device is arranged on mail server.
8, a kind of processing method of anti-rubbish mail, be applied to comprise in the mail treatment device of mail reception/delivery unit, general mail control unit, anti-trash processing unit and mail database, it is characterized in that, described mail database comprises the legitimate mail tabulation storehouse of preserving the legitimate mail template, the spam ATL of preserving the spam template and all mail samples storehouse of preserving undressed mail sample, and this method may further comprise the steps:
A, mail reception/delivery unit are when reception/transmission mail, and the mail that receives waiting/send is delivered to anti-trash processing unit and is kept in all mail samples storehouse;
B, anti-trash processing unit according to comparison strategy with legitimate mail tabulate in the storehouse the legitimate mail template with treat that reception/transmission mail compares one by one, and judge whether to exist the similar mail of waiting to receive/send mail, if exist then stamp legitimate mail and identify for the described receptions/transmission mail for the treatment of, otherwise execution in step c;
C, anti-trash processing unit according to comparison strategy with the spam template in the spam ATL with treat that reception/transmission mail compares one by one, and judge whether to exist the similar mail of waiting to receive/send mail, if exist then stamp spam and identify for the described receptions/transmission mail for the treatment of, otherwise execution in step d;
D, anti-trash processing unit are searched the described similar mail of preserving in all mail samples storehouse of waiting to receive/send mail, and judge whether similar number of mail surpasses predetermined value, if surpass then extract the new mail template of identical content generation of waiting to receive/sends mail and its similar mail according to the Alignment comparison algorithm, and determine the described type of waiting to receive/send mail according to the information of setting in advance, and stamp corresponding type identification, according to the email type of determining, described mail template is saved in the corresponding mail database;
E, anti-trash processing unit will be stamped the reception/transmission mail for the treatment of of type identification and deliver to general mail control unit and carry out processing controls, or return to mail reception/delivery unit.
9, method according to claim 8 is characterized in that, among step b or the c, described anti-trash processing unit obtains comparative result according to the Alignment comparison algorithm.
10, method according to claim 8 is characterized in that, in the steps d, anti-trash processing unit is determined email type according to black and white lists.
11, method according to claim 8 is characterized in that, described comparison strategy comprises: compare Mail Contents, or compare mail format, or relatively send the communication instruction sequence of mail, or adopt the combination in any of above-mentioned three kinds of modes.
12, method according to claim 8, it is characterized in that, step b or step c be described to judge whether to exist the method for the similar mail of waiting to receive/send mail to be specially: set in advance similarity threshold, and judge whether the described comparative result that compares one by one surpasses corresponding similarity threshold, if surpass then receive/send the similar mail of mail for waiting, otherwise be not.
CN200610001083A 2006-01-16 2006-01-16 Processing device and method for anti-junk mails Active CN100589453C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200610001083A CN100589453C (en) 2006-01-16 2006-01-16 Processing device and method for anti-junk mails

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200610001083A CN100589453C (en) 2006-01-16 2006-01-16 Processing device and method for anti-junk mails

Publications (2)

Publication Number Publication Date
CN101005462A CN101005462A (en) 2007-07-25
CN100589453C true CN100589453C (en) 2010-02-10

Family

ID=38704332

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200610001083A Active CN100589453C (en) 2006-01-16 2006-01-16 Processing device and method for anti-junk mails

Country Status (1)

Country Link
CN (1) CN100589453C (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101330476B (en) * 2008-07-02 2011-04-13 北京大学 Method for dynamically detecting junk mail
WO2010135861A1 (en) * 2009-05-25 2010-12-02 Chiao Hakfung Mail system, junk mail processor and method for marking junk mails
CN101699818B (en) * 2009-11-11 2012-07-04 海南电力试验研究所 Anti-spam management system and method thereof
CN102098332B (en) * 2010-12-30 2014-04-16 北京新媒传信科技有限公司 Method and device for examining and verifying contents
CN103188136B (en) * 2011-12-30 2016-04-27 盈世信息科技(北京)有限公司 A kind of filtrating mail information saving method, mail server and e-mail system
CN103841094B (en) * 2012-11-27 2017-04-12 阿里巴巴集团控股有限公司 Method and device for judging mail types
CN103795612B (en) * 2014-01-15 2017-09-12 五八同城信息技术有限公司 Rubbish and illegal information detecting method in instant messaging
CN105306342B (en) * 2015-09-29 2019-04-09 武汉钢铁(集团)公司 A kind of processing method and system of non-standard mailing system information errors
CN105871701A (en) * 2016-05-30 2016-08-17 周奇 Email handling method and device
CN106066884A (en) * 2016-06-06 2016-11-02 珠海市小源科技有限公司 A kind of information security recognition methods and device
CN107819664A (en) * 2016-09-12 2018-03-20 阿里巴巴集团控股有限公司 A kind of recognition methods of spam, device and electronic equipment
CN107171937A (en) * 2017-05-11 2017-09-15 翼果(深圳)科技有限公司 The method and system of anti-rubbish mail
CN107171944B (en) * 2017-06-27 2020-06-16 北京二六三企业通信有限公司 Junk mail identification method and device
CN108769140A (en) * 2018-05-09 2018-11-06 国家计算机网络与信息安全管理中心 A kind of realtime graphic Text region caching acceleration system
CN115567476A (en) * 2022-09-28 2023-01-03 建信金融科技有限责任公司 Junk mail detection method, device, processor and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于ART神经网络的垃圾邮件过滤技术. 马凤云,刘培玉.信息技术与信息化,第4期. 2005
基于ART神经网络的垃圾邮件过滤技术. 马凤云,刘培玉.信息技术与信息化,第4期. 2005 *

Also Published As

Publication number Publication date
CN101005462A (en) 2007-07-25

Similar Documents

Publication Publication Date Title
CN100589453C (en) Processing device and method for anti-junk mails
US10042919B2 (en) Using distinguishing properties to classify messages
US8768940B2 (en) Duplicate document detection
US8713014B1 (en) Simplifying lexicon creation in hybrid duplicate detection and inductive classifier systems
US7930351B2 (en) Identifying undesired email messages having attachments
CN105490915B (en) method, device and software product for filling in an address field of an electronic message
US7949718B2 (en) Phonetic filtering of undesired email messages
Fumera et al. Spam filtering based on the analysis of text information embedded into images.
US8935348B2 (en) Message classification using legitimate contact points
US7502829B2 (en) Apparatus, methods and articles of manufacture for intercepting, examining and controlling code, data and files and their transfer
US8112484B1 (en) Apparatus and method for auxiliary classification for generating features for a spam filtering model
US20050132197A1 (en) Method and apparatus for a character-based comparison of documents
US20060259551A1 (en) Detection of unsolicited electronic messages
US7624274B1 (en) Decreasing the fragility of duplicate document detecting algorithms
US20050198180A1 (en) Method and system for providing automatic email address book
CN115688727A (en) Generating and applying outgoing communication templates
Bogawar et al. Email mining: a review
CN109462538A (en) Electronic device, the mail sharing method based on level of confidentiality and storage medium
JP4642903B2 (en) Message conversion system and method with enhanced context recognition
US11036976B2 (en) Methods and systems of handwriting recognition in virtualized-mail services
US10163005B2 (en) Document structure analysis device with image processing
US20220050862A1 (en) Method for processing disappearing messages in an electronic messaging service and corresponding processing system
CN106713108A (en) Mail classification method combining user relationships with Bayers theory
Shi et al. A Service-Oriented Spam Filtering System Using Shared Fingerprints
JP2010092251A (en) Information processing apparatus, information processing method, and information processing program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant