CN101594312B - Method for recognizing junk mail based on artificial immunity and behavior characteristics - Google Patents

Method for recognizing junk mail based on artificial immunity and behavior characteristics Download PDF

Info

Publication number
CN101594312B
CN101594312B CN 200810044484 CN200810044484A CN101594312B CN 101594312 B CN101594312 B CN 101594312B CN 200810044484 CN200810044484 CN 200810044484 CN 200810044484 A CN200810044484 A CN 200810044484A CN 101594312 B CN101594312 B CN 101594312B
Authority
CN
China
Prior art keywords
antibody
spam
mail
characteristic
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 200810044484
Other languages
Chinese (zh)
Other versions
CN101594312A (en
Inventor
何兴高
钟婷
程红蓉
陈佳
曾志华
文思群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN 200810044484 priority Critical patent/CN101594312B/en
Publication of CN101594312A publication Critical patent/CN101594312A/en
Application granted granted Critical
Publication of CN101594312B publication Critical patent/CN101594312B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to an internet technology and discloses a method for generating a mail behavior characteristic library, a method, and a method for updating a junk mail characteristic library. Seen from the technical scheme provided by the embodiment of the invention, the embodiment adopts a method for generating the behavior characteristic library by classifying known mails and judges the classified unknown mails through the generated characteristics. When the unclassified mails are recognized, the behavior characteristic similarity of the unknown mails and the known junk mails in an antibody library is calculated, and a score set is also set. The extent of the behavior characteristics of the unknown mails tending to the junk mails is viewed by calculating the total score. The junk mails can be more exactly recognized through double standards. When a database is updated, the update of the antibody library is realized through the recognized junk mails by a clonal variation algorithm, thereby being more adaptable to the behavior characteristic variation tendency of the junk mails and the junk mail variation tendency within a certain period of time.

Description

A kind of spam recognition methods based on artificial immunity and behavioural characteristic
Technical field
The present invention relates to Internet technology, be specifically related to a kind of mail behavioural characteristic library generating method, spam determination methods, spam feature database update method.
Background technology
In recent years, along with the development of Internet, the spam problem more and more causes people's attention.Spam all the time perplexing the Internet user, waste Internet resources, even possibly cause other serious social concern more.Along with the development of technology, the Spam filtering technology has also obtained increasing concern and development.
Most of Anti-Spam product is all also adopting IP to filter, and keyword filtration filters and method such as RBL filtration is carried out the differentiation of spam based on the intelligent content of Bayesian statistics algorithm.But their rate of false alarms are high, and handling property is very low, and the language dependence is strong.Because these technology; Do not jump out the technology limitation that content match is filtered, they need with mail is complete receive after, mail is carried out word segmentation processing according to appointed language; And with one have millions of dictionaries to mate one by one, thereby estimate whether this mail is spam.
Contrast spam and normal email, the two the most basic difference are that purpose separately is different.In order to reach its illegal objective, the lawless person will show in the process of using Email and normal usage behavior different character, and on network, stays corresponding vestige.Based on this point, adopt principle and the method discerned based on the spam behavior, can improve filtrating mail speed, avoided inevitably rate of false alarm problem of information filtering technology simultaneously, make Spam filtering efficient more and accurate.
In recent years, intelligent technological the comparing with conventional method of spam detection embodied stronger learning ability and adjustment characteristic.In intelligent email processing method, machine learning and Artificial Neural Network model have obtained deep research.At present main junk mail detection method has: naive Bayesian (Naive Bayes), SVMs (SVM), artificial neural net (ANN), based on the rubbish mail filtering method of immunity etc.
The artificial immunity model is applied to spam handles, mainly utilize own in the immunity/non-own detection principle and the notion of detector.Extract n spam behavioural characteristic, each characteristic is as a chromosome, and each envelope mail all can be converted into one has the chromosomal gene cell in n position.Training an antibody (antibody) collection with the known class spam earlier, promptly is antibody library, and the unidentified mail of each envelope is antigen (antigen).Calculate the similarity (affinity) of each antibody in this antigen and the antibody library then, if the similarity of the maximum that draws, thinks so that antibody recognition has gone out this antigen and it is classified as spam greater than predefined a certain threshold value.And along with the different similarities that are identified antigen, the clonal vaviation renewal is carried out in the antagonist storehouse, promptly guarantees the high recognition performance of antibody collection, makes it more can adapt to the more new trend of new spam again.
Summary of the invention
The purpose of the embodiment of the invention provides a kind of mail behavioural characteristic library generating method and device, spam determination methods, Apparatus and system, spam feature database update method and device; Use embodiment provided by the invention; Can judge the spam information type, thereby filter spam information.
At first, in order to solve the problem that prior art exists, execution mode of the present invention proposes a kind of mail behavioural characteristic library generating method, and the step of this method comprises:
Read the content of the known e-mail messages of classification;
Use the specific behavior feature extraction algorithm to obtain the antibody collection said e-mail messages content;
Adopt described antibody collection to use the particular analysis algorithm to carry out probability analysis and obtain final antibody library;
Use the particular analysis algorithm to carry out finally branch manifold of fractional computation said antibody library.
Correspondingly, execution mode of the present invention has proposed a kind of mail behavioural characteristic storehouse generating apparatus, and this device comprises:
The e-mail messages reading unit is used to read the information content of the known mail of classification;
Feature extraction unit is used for extracting the behavioural characteristic that meets specified conditions from said Mail Contents;
The antibody library generation unit is used for that said behavioural characteristic is carried out probability analysis and obtains final antibody collection;
The feature scores generation unit is used for said characteristic probability is calculated final branch manifold.
Secondly, execution mode of the present invention also provides a kind of spam determination methods, and the step of this method comprises:
Read the content of classification unknown mails information;
Said e-mail messages content is carried out format analysis;
Use the specific behavior feature extraction algorithm to obtain antigen the e-mail messages content after resolving;
Read antibody library and branch manifold that described specific mail information behavior feature extraction algorithm and probabilistic algorithm obtain;
Use the specific identification algorithm to calculate to said antigen;
According to result of calculation said mail is judged.
Correspondingly, execution mode of the present invention has proposed a kind of spam judgment means, and this device comprises:
The e-mail messages reading unit is used to read the classification unknown mails information content;
Information content resolution unit is used for resolving the classification unknown mails information content;
Feature extraction unit is used for extracting the behavioural characteristic that meets specified conditions from said Mail Contents;
Antibody library and branch manifold reading unit are used to read antibody library and reciprocal fraction set content;
Computing unit is used for said behavioural characteristic, antibody library and branch manifold as input, adopts the specific identification algorithm to calculate;
Judging unit is used for according to the result of calculation of said computing unit said classification unknown mails information type being judged.
Then, execution mode of the present invention also provides spam feature database update method, and the step of this method comprises:
Read the behavior characteristic information of the spam that has been identified;
Judge the similarity of the antibody that identifies said spam;
Adopt described antibody to use specific cloning variation algorithm to carry out the clone and the variation of antibody;
Said antigen uses the specific identification algorithm to calculate with the new antibodies that variation is produced;
According to judging result of calculation, the mail antibody library is upgraded.
Correspondingly, execution mode of the present invention has proposed a kind of spam feature database updating device, and this device comprises:
Spam information reading unit is used to read the behavior characteristic information of the spam that has been identified;
The similarity judging unit is used to judge the similarity of the antibody that identifies said spam judge whether clonal vaviation;
The clonal vaviation unit is used for clone and variation that described antibody carries out antibody, produces new antibody;
Computing unit is used for using the specific identification algorithm to calculate to the new antibodies that said antigen and variation are produced;
Judge updating block, be used for result of calculation, judge and upgrade antibody library according to said computing unit.
At last, embodiment of the present invention has following beneficial effect:
Can find out that from the above technical scheme that the embodiment of the invention provides the embodiment of the invention adopts the behavioural characteristic library generating method of the known mail of classification, and use the characteristic that generates that the unknown mail of classifying is judged.When the unfiled mail of identification, in calculating unknown mails and antibody library, the behavioural characteristic similarity of known spam, the branch manifold is set also.Through calculating gross score, the degree of checking its behavioural characteristic convergence of unknown mails and spam.Can discern spam more accurately through double standards.When database update,,, more can adapt to spam behavioural characteristic variation tendency and spam variation tendency in the regular period through the renewal of clonal vaviation algorithm realization antibody library with discerning spam.
Description of drawings
Fig. 1 is the particular flow sheet of mail behavioural characteristic library generating method embodiment one of the present invention;
Fig. 2 is the particular flow sheet of spam determination methods embodiment one of the present invention;
Fig. 3 is the particular flow sheet of spam feature database update method embodiment one of the present invention;
Fig. 4 is the structure chart of mail behavioural characteristic of the present invention storehouse generating apparatus embodiment one;
Fig. 5 is the structure chart of spam judgment means embodiment one of the present invention;
Fig. 6 is the structure chart of spam feature database updating device embodiment one of the present invention;
Fig. 7 judges the structure chart of system embodiment one for spam of the present invention.
Embodiment
For make the object of the invention, technical scheme, and advantage clearer, below with reference to the accompanying drawing embodiment that develops simultaneously, to further explain of the present invention.
As shown in Figure 1, mail behavioural characteristic library generating method embodiment one provided by the invention comprises:
Step 101, read the classification known e-mail messages content;
The known e-mail messages of classifying is meant that this classification of mail is known, and just this mail is that normal email or spam are to confirm;
Step 102, use the specific behavior feature extraction algorithm to obtain the antibody collection said e-mail messages content;
Here employed specific behavior extraction algorithm is meant rubbish contents (comprising mail head and mail body) is carried out feature extraction; Behavioural characteristic to mail possibly occur is extracted, as: the field of the easy forgery among the mail head, the field of dns resolution, mail are write the field of forging in nonstandard field, the routing iinformation; " www " that has in the mail body or " http: // " and fields such as " ".Have certain characteristic and be designated as 1, otherwise be designated as 0, write 0 or 1 fixed length array form so mail can be designated as each.
Discern the characteristic information that mail extracts and generate spam antibody collection and normal email antibody collection respectively said.
Step 103, the described antibody collection of employing use the particular analysis algorithm to carry out probability analysis and obtain final antibody library;
Use spam antibody collection and normal email antibody collection, generate the antibody library of effective recognition spam through analytical calculation.
The present invention uses the tolerance principle and reverse selection algorithm method in the immunity principle.
Step 104, use the particular analysis algorithm to carry out finally branch manifold of fractional computation said antibody library;
Read in spam antibody collection and normal email antibody collection; Calculate the mark of said each characteristic item and obtain final branch manifold.
The present invention further provides a kind of specific calculating to divide counting method.When the spam collection equates with normal email collection quantity, the number of times that each characteristic that adds up concentrate to occur at spam, each characteristic that adds up is simultaneously concentrated the number of times that occurs in normal email, obtains the absolute value that twice numbers subtract each other.This embodies the effect size that every kind of characteristic plays in distinguishing spam and normal email, the value that each characteristic is asked is preserved into a branch manifold.
As shown in Figure 2, spam determination methods embodiment one provided by the invention comprises:
Step 201, read the content of classification unknown mails information;
For the unknown mails that needs are judged, read its e-mail messages so that to next step format analysis.
Step 202, said e-mail messages content is carried out format analysis;
The format analysis here is meant resolving according to this information protocol, and being shown as this information table can the identified information content of text, forms mail format tree etc. as mail being utilized internet mail extension protocol format (MIME);
Step 203, the e-mail messages content after will resolving use the specific behavior feature extraction algorithm to obtain antigen;
Here employed specific behavior extraction algorithm is meant rubbish contents (comprising mail head and mail body) is carried out feature extraction; Have certain characteristic and be designated as 1, otherwise be designated as 0, write 0 or 1 fixed length array form so mail can be designated as each.
Step 204, read antibody library and branch manifold that described specific mail information behavior feature extraction algorithm and probabilistic algorithm obtain;
Form the form of several antibody with reading in antibody collection in the antibody library, and said minute manifold mated according to character pair;
Step 205, use the specific identification algorithm to calculate to said antigen;
The similarity of calculating antibody and each characteristic of antigen, and the mark of this similar characteristic that adds up can obtain the total similarity and the gross score of this mail at last.
Step 206, said mail is judged according to specific result of, calculation.
Whether judge total similarity and gross score greater than specific threshold, thus the identification email type.If total similarity of the information that obtains and gross score are judged that then this information is junk information, otherwise are non-junk information greater than the threshold value of regulation.The threshold value here is a User Defined, can draw according to continuous experimental result.
As shown in Figure 3, spam feature database update method embodiment one provided by the invention comprises:
Step 301, read the behavior characteristic information of the spam that has been identified;
Read the information of antigenic information that is identified as spam and the antibody of discerning this antigen.
Step 302, judgement identify the similarity of the antibody of said spam;
Whether the size of judging similarity when identifying this antigen equals maximum similarity, if then do not need to equal clonal vaviation, if the unequal clonal vaviation that then gets into.
Step 303, the described antibody of employing use specific cloning variation algorithm to carry out the clone and the variation of antibody;
The antibody that identifies this antigen is carried out clone algorithm and variation algorithm, generate new antibody.
Step 304, said antigen use the specific identification algorithm to calculate with the new antibodies that variation is produced;
Said antigen and new antibody are discerned calculating, calculate the similarity of each characteristic, and the mark of this similar characteristic that adds up, can obtain the total similarity and the gross score of this mail at last.
Step 305, according to judging result of calculation, adopt ad hoc approach that the mail antibody library is upgraded.
The ad hoc approach here is meant if the similarity of total similarity when discern greater than original antibody, new antibodies replacement original antibody then, the renewal of realization antibody.
As shown in Figure 4, the present invention provides mail behavioural characteristic storehouse generating apparatus embodiment one to comprise:
E-mail messages reading unit 401 is used to read the information content of the known mail of classification;
The known e-mail messages of classifying should have normal email information also spam information will be arranged, thus can certified mail information characteristics storehouse comprehensive, thereby accuracy that can certified mail information behavior characteristic.
Feature extraction unit 402 is used for extracting the behavioural characteristic that meets specified conditions from said Mail Contents;
These characteristics should be the behavioural characteristics that occurs in representative mail head and the mail body.Sum up as much as possible and count on the behavioural characteristic that spam can show, as: the field of the easy forgery among the mail head, the field of dns resolution, mail are write the field of forging in nonstandard field, the routing iinformation; " www " that has in the mail body or " http: // " and " " etc.Have certain characteristic and be designated as 1, otherwise be designated as 0, write 0 or 1 fixed length array form so mail can be designated as each.
Antibody library generation unit 403 is used for that said behavioural characteristic is carried out probability analysis and obtains final antibody collection;
According to the difference of known mail, mail is extracted behavioural characteristic produce antibody, deposit spam antibody library and normal email antibody library respectively in.
Feature scores generation unit 404 is used for using ad hoc approach to calculate final branch manifold said characteristic probability;
The invention provides a kind of calculating and divide counting method.When the spam collection equates with normal email collection quantity, the number of times that each characteristic that adds up concentrate to occur at spam, each characteristic that adds up is simultaneously concentrated the number of times that occurs in normal email, obtains the absolute value that twice numbers subtract each other.Then embody the effect size that every kind of characteristic plays in distinguishing spam and normal email, regard the value that each characteristic is asked as a mark, preserve into a branch manifold altogether.
From on can find out that because these antibody can adjust and revise dynamically, twit filter has utilized Statistics simultaneously, the characteristics of " study " reception information are automatically adjusted the score value of junk information.This makes this filtration to upgrade according to the continuous variation of the spam behavioural characteristic trend of different phase.
As shown in Figure 5, the present invention provides spam judgment means embodiment one to comprise:
E-mail messages reading unit 501 is used to read the classification unknown mails information content;
Information content resolution unit 502 is used for resolving the classification unknown mails information content;
Said Mail Contents is carried out format analysis; The format analysis here is meant resolving according to this information protocol; Being shown as this information table can the identified information content of text, forms mail format tree etc. as mail being utilized internet mail extension protocol format (MIME);
Feature extraction unit 503 is used for extracting the behavioural characteristic that meets specified conditions from said Mail Contents;
Here employed specific behavior extraction algorithm is meant that rubbish contents (comprising mail head and mail body) is carried out behavioural characteristic to be extracted.Have certain characteristic and be designated as 1, otherwise be designated as 0, write 0 or 1 fixed length array form so mail can be designated as each.
Antibodies specific storehouse and branch manifold reading unit 504 are used to read antibodies specific storehouse and reciprocal fraction set content;
Form the form of several antibody with reading in antibody collection in the antibody library, and said minute manifold mated according to character pair;
Computing unit 505 is used for said behavioural characteristic, antibody library and branch manifold as input, adopts the specific identification algorithm to calculate;
The specific identification algorithm is meant; Relatively between antibody array and the antigen array, whether the array of each equates (similar), the similarity of calculating antibody and each characteristic of antigen; And the mark of this similar characteristic that adds up can obtain the total similarity and the gross score of this mail at last.
Judging unit 506 is used for result of calculation according to said computing unit as input, adopts the particular decision algorithm that said classification unknown mails information type is judged.
Whether the particular decision algorithm is meant judges total similarity and gross score greater than specific threshold, thus the identification email type.If total similarity of the information that obtains and gross score are judged that then this information is junk information, otherwise are non-junk information greater than the threshold value of regulation.The threshold value here is a User Defined, can draw according to continuous experimental result.
From on can find out that because we are extracted into the array form with mail in advance when mating, the branch manifold is designated as the form of array simultaneously; And because head office is that characteristic quantity is certain; So in application process, the speed of matching judgment will can be very not slow, satisfies actual needs.
As shown in Figure 6, the present invention provides spam feature database updating device embodiment one to comprise:
Spam information reading unit 601 is used to read the behavior characteristic information of the spam that has been identified;
The antibody information that reads the behavior characteristic information of the spam that has been identified and identify this spam;
Similarity judging unit 602 is used to identify the similarity of the antibody of said spam, adopts special algorithm to judge whether clonal vaviation;
So-called ad hoc approach is meant the similarity to the said spam antibody that identifies, and judges whether it equals maximum similarity.If equate, then need not get into the subsequent module operation; If unequal, then get into the clonal vaviation module.
Clonal vaviation unit 603 is used for clone and variation that described antibody carries out antibody, adopts specific cloning variation algorithm to produce new antibody;
Specific cloning variation algorithm comprises clone and variation two parts.Clone refer to antibody be copied into several (its duplicate what be inversely proportional to similarity, promptly similarity high more duplicate the number few more); Each is replicated body makes a variation, promptly a certain position becomes 1 or become 0 from 1 from 0 among the array a; For the randomness that simulates biological immune and comprehensive, the position of variation determines at random.
Computing unit 604 is used for using the specific identification algorithm to calculate to the new antibodies that said antigen and variation are produced;
Said antigen respectively with variation after new antibody use recognizer to calculate similarity.
Judge updating block 605, be used for result of calculation, judge and upgrade antibody library according to said computing unit.
If the similarity when having similarity to discern greater than original antibody, then new antibodies replacement original antibody is realized the renewal of antibody.
From on can find out that because we have adopted the method for clonal vaviation, corresponding renewal has been carried out in the antagonist storehouse after the identification mail, more can adapt to spam behavioural characteristic variation tendency in the regular period.
Further, the invention provides network information type and judge system implementation example one, as shown in Figure 7, comprising:
Mail behavioural characteristic storehouse generating apparatus 701 is used to generate the feature database and branch manifold of mail
Comprise the information content that reads the known mail of classification; From said Mail Contents, extract the behavioural characteristic that meets specified conditions; Said specific behavior characteristic is carried out probability analysis through the particular analysis algorithm obtain final antibody collection; Said characteristic probability is calculated final branch manifold through special algorithm.
Spam judgment means 702 is used for spam and discerns automatically
Comprise and read the classification unknown mails information content; Resolve the classification unknown mails information content; From said Mail Contents, extract the behavioural characteristic that meets specified conditions; Read antibodies specific storehouse and reciprocal fraction set content; As input, adopt the specific identification algorithm to calculate with said behavior special characteristic, antibody library and branch manifold; As input, adopt special algorithm that said classification unknown mails information type is judged according to the result of calculation of said computing unit.
Spam feature database updating device 703, the spam feature database is used to upgrade in time
Comprise the specific behavior characteristic information that reads the spam that has been identified; Identify the similarity of the antibody of said spam, adopt special algorithm to judge whether clonal vaviation; Described antibody carries out the clone and the variation of antibody, adopts the new antibody of specific cloning variation algorithm production; The new antibodies that said antigen and variation are produced uses the specific identification algorithm to calculate; According to the result of calculation of said computing unit, adopt the particular decision algorithm to upgrade antibody library.
More than mail behavioural characteristic library generating method and device, spam determination methods, Apparatus and system, spam feature database update method and device that the embodiment of the invention is provided carried out detailed introduction, the explanation of above embodiment just is used for helping to understand method of the present invention and thought thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, the part that on embodiment and range of application, all can change, in sum, this description should not be construed as limitation of the present invention.

Claims (3)

1. mail behavioural characteristic library generating method; It is characterized in that; The behavioural characteristic that at first possibly occur the rubbish contents that comprises mail head and mail body is carried out feature extraction, as: field, the field of dns resolution, the mail forged easily among the mail head are write the field of forging in nonstandard field, the routing iinformation; " www " that has in the mail body or " http: // " and fields such as " " have certain characteristic and are designated as 1, otherwise be designated as 0, have obtained writing the spam antibody collection and the normal email antibody collection of 0 or 1 fixed length array in view of the above; Adopt described antibody collection to use tolerance principle in the immunity principle to carry out probability analysis and obtain final antibody library with reverse selection algorithm; Said antibody library is carried out fractional computation; When the spam collection equates with normal email collection quantity; The number of times that each characteristic that adds up concentrate to occur at spam, each characteristic that adds up is simultaneously concentrated the number of times that occurs in normal email, obtains the absolute value that twice numbers subtract each other; In order to embody the effect size that every kind of characteristic plays in distinguishing spam and normal email, the value that each characteristic is asked is preserved into a branch manifold.
2. a spam determination methods is characterized in that, reads and resolve unknown mail classifying content, adopts the described characteristic library generating method of claim 1 from Mail Contents, to extract behavioural characteristic; Adopt tolerance principle and reverse selection algorithm method in the immunity principle, use spam antibody collection and normal email antibody collection, generate the antibody library of effective recognition spam through analytical calculation; Read in spam antibody collection and normal email antibody collection; When the spam collection equates with normal email collection quantity; Each characteristic that adds up is concentrated the number of times that occurs at spam; Each characteristic that adds up simultaneously is at the number of times that normal email concentrate to occur, and obtains the absolute value that twice numbers subtract each other, and the value that each characteristic is asked is preserved into a branch manifold; , form the form of several antibody with reading in antibody collection in the antibody library, and said minute manifold mated according to character pair as input with said behavioural characteristic, antibody library and branch manifold; Relatively between antibody array and the antigen array, whether the array of each is similar, and the similarity of calculating antibody and each characteristic of antigen, and the mark of this similar characteristic that adds up can obtain the total similarity and the gross score of this mail at last; Judge that total similarity and gross score whether greater than the threshold value of regulation, judge that then this information is junk information, otherwise be non-junk information; The threshold value here is a User Defined, can draw according to continuous experimental result.
3. spam feature database update method is characterized in that, reads the information of antigenic information that is identified as spam and the antibody of discerning this antigen; Whether the similarity of judging antibody when identifying said spam equals maximum similarity, if equal, then need not get into the subsequent module operation; If unequal, then get into the clonal vaviation module; Adopt described antibody to use then and preset the clonal vaviation algorithm; Antibody is copied into several, its duplicate what be inversely proportional to similarity, promptly similarity high more duplicate the number few more; Each is replicated body makes a variation, promptly the wherein a certain position of array a becomes 1 or become 0 from 1 from 0; For the randomness that simulates biological immune and comprehensive, the position of variation determines at random; Said antigen and new antibody are discerned calculating; Calculate the similarity of each characteristic; And the mark of this similar characteristic that adds up can obtain the total similarity and the gross score of this mail at last, if the similarity when having similarity to discern greater than original antibody; Then new antibodies replacement original antibody is realized the renewal of antibody.
CN 200810044484 2008-05-30 2008-05-30 Method for recognizing junk mail based on artificial immunity and behavior characteristics Expired - Fee Related CN101594312B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200810044484 CN101594312B (en) 2008-05-30 2008-05-30 Method for recognizing junk mail based on artificial immunity and behavior characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200810044484 CN101594312B (en) 2008-05-30 2008-05-30 Method for recognizing junk mail based on artificial immunity and behavior characteristics

Publications (2)

Publication Number Publication Date
CN101594312A CN101594312A (en) 2009-12-02
CN101594312B true CN101594312B (en) 2012-12-26

Family

ID=41408765

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200810044484 Expired - Fee Related CN101594312B (en) 2008-05-30 2008-05-30 Method for recognizing junk mail based on artificial immunity and behavior characteristics

Country Status (1)

Country Link
CN (1) CN101594312B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105871887B (en) * 2016-05-12 2019-01-29 北京大学 Client-based individual electronic mail filtering system and filter method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004105332A2 (en) * 2003-05-15 2004-12-02 Brightmail, Inc. Method and apparatus for filtering email spam based on similarity measures
CN1941746A (en) * 2005-09-27 2007-04-04 腾讯科技(深圳)有限公司 Method and system against rubbish e-mails

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004105332A2 (en) * 2003-05-15 2004-12-02 Brightmail, Inc. Method and apparatus for filtering email spam based on similarity measures
CN1941746A (en) * 2005-09-27 2007-04-04 腾讯科技(深圳)有限公司 Method and system against rubbish e-mails

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
B. Sirisanyalak et al.An Aritificial Immunity-Based Spam Detection System.《Evolutionary Computation, 2007. CEC 2007. IEEE Congress》.2007,3392-3398. *
张成功.人工免疫系统原理及其在反垃圾邮件技术中的应用研究.《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》.2005,全文. *
胡可等.基于人工免疫系统的反垃圾邮件过滤机制.《计算机应用》.2005,第25卷(第11期),2559-2561. *

Also Published As

Publication number Publication date
CN101594312A (en) 2009-12-02

Similar Documents

Publication Publication Date Title
CN102289522B (en) Method of intelligently classifying texts
CN103441924B (en) A kind of rubbish mail filtering method based on short text and device
CN103186845B (en) A kind of rubbish mail filtering method
CN104239539B (en) A kind of micro-blog information filter method merged based on much information
CN109165294A (en) Short text classification method based on Bayesian classification
CN107908715A (en) Microblog emotional polarity discriminating method based on Adaboost and grader Weighted Fusion
Faguo et al. Research on short text classification algorithm based on statistics and rules
CN107122352A (en) A kind of method of the extracting keywords based on K MEANS, WORD2VEC
CN103226576A (en) Comment spam filtering method based on semantic similarity
US20160080476A1 (en) Meme discovery system
CN108897784A (en) One emergency event dimensional analytic system based on social media
Liliana et al. Indonesian news classification using support vector machine
CN100587714C (en) Method for filtering junk nails
CN103020645A (en) System and method for junk picture recognition
CN102663435A (en) Junk image filtering method based on semi-supervision
Temma et al. The document similarity index based on the Jaccard distance for mail filtering
CN104281694A (en) Analysis system of emotional tendency of text
CN110738047B (en) Microblog user interest mining method and system based on image-text data and time effect
CN108399238A (en) A kind of viewpoint searching system and method for fusing text generalities and network representation
CN105337842B (en) A kind of rubbish mail filtering method unrelated with content
Sitorus et al. Sensing trending topics in twitter for greater Jakarta area
CN108268461A (en) A kind of document sorting apparatus based on hybrid classifer
CN101594312B (en) Method for recognizing junk mail based on artificial immunity and behavior characteristics
Drishya et al. Cyberbully image and text detection using convolutional neural networks
CN106779080A (en) A kind of people information knowledge base method for auto constructing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20121226

Termination date: 20160530