CN108763449A - A kind of Chinese key rule generating method of Spam filtering - Google Patents
A kind of Chinese key rule generating method of Spam filtering Download PDFInfo
- Publication number
- CN108763449A CN108763449A CN201810521174.0A CN201810521174A CN108763449A CN 108763449 A CN108763449 A CN 108763449A CN 201810521174 A CN201810521174 A CN 201810521174A CN 108763449 A CN108763449 A CN 108763449A
- Authority
- CN
- China
- Prior art keywords
- rule
- keyword
- word
- spam
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/21—Monitoring or handling of messages
- H04L51/212—Monitoring or handling of messages using filtering or selective blocking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/42—Mailbox-related aspects, e.g. synchronisation of mailboxes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Artificial Intelligence (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a kind of Chinese key rule generating methods of Spam filtering, this method includes mainly concentrating acquisition keyword candidate word, feature extraction to obtain keyword, acquisition keyword rule triggering situation from mail, assigning four steps of score value for keyword rule, compare current techniques, method proposed by the present invention improves keyword feature extracting method, the influence of universal word is reduced in conjunction with the feature extracting method of word frequency and document frequency, using neural network algorithm computation rule score value, compare the expense that genetic algorithm reduces study.The present invention solves current Chinese keyword rule timeliness deficiency, and can generate the keyword rule that best suit user characteristics for the mail data collection of definition and the offer of spam according to a specific user group.
Description
Technical field
The present invention relates to the technical fields of internet security, refer in particular to a kind of Chinese key rule of Spam filtering
Then generation method.
Background technology
With the high speed development of the development of internet, especially mobile Internet, network communication means are further abundant, still
The Email service most wide as the Internet, applications still remains its irreplaceability.Nowadays the rubbish postal spread unchecked on network
Part can waste a large amount of Internet resources, increase the time cost that user handles mail, the propagation even meeting of some viral spams
Directly contribute huge economic loss.By the research of various countries' researcher's many decades, have accumulated now ripe and abundant
Anti-spam technologies, include mainly sending that the technology that is detected to sender's identity of principle is for example black, white list based on mail,
SPF is detected, Honeypot Techniques etc.;Filtering technique based on user behavior such as con current control, FREQUENCY CONTROL etc.;Based on Mail Contents mistake
The method of filter is realized in conjunction with machine learning and knowledge of statistics by being based on probability and rule-based two classes method.It is rule-based
Spam filtering solution of increasing income in SpamAssassin have better effect.Have in the rule of Spam Assassin
One kind is keyword rule, and the operation principle of keyword rule is scan mail head and mail body, and whether inspection wherein includes
Common words in spam, each keyword rule are endowed specific weight score, and SpamAssassin officials are only
It safeguards English keyword rule, therefore cannot check the everyday expressions for carrying out in Chinese email whether to include spam,
CCERT in 2004 has developed a Chinese rules using word frequency statistics and genetic algorithm, but is just no longer updated from 2006,
Over time, the common keyword of spam can also change, and rule set mentioned above is deposited in timeliness
In deficiency, CCERT, using word frequency statistics, is chosen at spam and concentrates the highest vocabulary of word frequency when extracting keyword feature,
Some can be appeared in the key that the common words in spam and surface mail are regarded as spam simultaneously by this method
Word, it is unreasonable when obvious in this way, and CCERT is rule calculating point using the genetic algorithm that old edition SpamAssassin is provided
Value, since 3.4 versions, SpamAssassin has been updated to neural network algorithm, compares genetic algorithm neural network algorithm energy
Learning time expense is enough efficiently reduced, in addition user often has different standards for the judgement of spam.In summary
The problem of, propose that a kind of solution that the specific mail collection of basis generates Chinese key rule is of great significance.
Invention content
The shortcomings that it is an object of the invention to overcome the prior art and deficiency, it is proposed that a kind of Chinese of Spam filtering
Keyword rule generating method can be automatically generated according to the specific mail data set that user provides and be best suited in user demand
Literary keyword rule, in rule-based Spam filtering scheme.
To achieve the above object, technical solution provided by the present invention is:A kind of Chinese key of Spam filtering
Rule generating method, this method to giving mail data collection by carrying out the mail that data prediction obtains mail in the data set
All vocabulary of head and mail body portion are as keyword candidate word, by the feature extraction side for combining word frequency and document frequency
Method selectes keyword, and then carries out filtrating mail to above-mentioned mail data collection using the keyword rule and advised to obtain keyword
It then in the triggering situation of spam and normal email, and uses this triggering situation as the input of neural network algorithm, leads to
It crosses and trains neural network to be restrained until filter effect with machine descending method, convert the weight that training obtains to the score of rule,
Finally obtained rule can be applied in the solution of rule-based filtrating mail;It specifically includes following steps:
1) mail data collection is pre-processed to obtain keyword candidate by mail screening, mail parsing, Chinese word segmentation
Set of words;
2) to whole glossary statistic word frequency of candidate word set, document frequency, by comparing document frequency after first comparing word frequency
The feature extraction of rate selects keyword from candidate word set;
3) it collects mail data and concentrates the keyword triggering situation for often sealing mail, and format triggering situation data;
4) score value is assigned to keyword rule by neural network algorithm according to above-mentioned keyword triggering situation.
In step 1), the mail screening refers to the pure English file rejected mail data and concentrated, the mail parsing
It is to realize that carrying out parsing to Mail Contents based on RFC822 and MIME agreements is partitioned into different part selection mail head and mail
The part of body, the Chinese word segmentation are segmented to the content of text of mail head and mail body using Chinese word segmentation tool.
In step 2), Feature Selection is done in conjunction with the method for word frequency and document frequency, determines keyword, including following step
Suddenly:
2.1) word frequency, document frequencies are counted, word frequency refers to the number that a word occurs in a document, and document frequency is pointed out to show certain
The document number of a candidate word;
2.2) the highest N number of word of word frequency in spam is chosen;
2.3) according to formula spam (wi)/spam (wi)+ham (wi)>T% filters out keyword, meets the wi of the formula
An as keyword, wherein wi indicate some word in the highest N number of set of words of word frequency, spam (wi) indicate to include word wi
Spam number, ham (wi) indicates include the normal email number of word wi, and T% indicates some threshold value being arranged.
In step 3), collects mail data using Open-Source Tools SpamAssassin and the keyword of often envelope mail is concentrated to touch
Heat condition, and triggering situation data are formatted, include the following steps:
3.1) strictly all rules built in SpamAssassin are disabled, bayesian algorithm is deactivated, eliminates the influence of Else Rule,
Add the keyword rule generated in step 2);
3.2) the mass-check scripts that SpamAssassin is provided are used to call every part of mail in training set
SpamAssassin is filtered, and the strictly all rules that every envelope mail is triggered then are recorded in diary;
3.3) after-treatment is carried out to journal file, by processing structure structuring.
The use of neural network algorithm is the keyword rule tax score value generated in step 2), including following in step 4)
Step:
4.1) non-spam email is subjected to redundancy duplication first, the formula for adding the number of non-spam email is 1+
(number_of_test_hit) * ham_preference, ham_preference input for parameter, are defaulted as 2.0,
Number_of_test_hit refers to the mail triggers how many rule;
4.2) it is that the weight in particular range is randomly assigned per rule, range is by regular the case where triggering mail number
It determines;
4.3) it is trained using neural network algorithm, num_epochs rear stopping of iteration, num_epochs refers to nerve
The number of network iteration specifies weight_decay parameters and bias parameters, wherein weight_decay ginsengs in each round iteration
Number refers to the speed that weights are decayed in an iteration, and bias parameters refer to deviation and are used for smooth statistics exception;
4.4) it deletes training and obtains the rule that score value is 0, the rule ultimately generated.
Compared with prior art, the present invention having the following advantages that and advantageous effect:
1, the present invention is realized generates Chinese key rule according to specific mail collection, solves current Chinese keyword
Regular timeliness is insufficient.
2, the present invention can best suit user spy according to a specific user group for the definition generation of spam
The keyword rule of sign.
3, present invention improves over keyword feature extracting methods, cancel in conjunction with the feature extracting method of word frequency and document frequency
The influence of some common words.
4, the present invention uses neural network algorithm computation rule score value, more traditional genetic algorithm to reduce study
Expense.
Description of the drawings
Fig. 1 is the data flow diagram of the method for the present invention specific implementation.
Fig. 2 is the flow chart that keyword method is chosen in the present invention.
Fig. 3 is the flow chart that keyword rule score value method is determined in the present invention.
Specific implementation mode
The present invention is further explained in the light of specific embodiments.
The present embodiment comments the method for the present invention using open Chinese email data set SEWMTest Corpus2011
It surveys, accidentally filterability, leakage filterability, logic is averagely missed into filterability, metric as evaluation index, the data flow of this example is as schemed
Shown in 1, realize that the Chinese key rule generating method detailed process of the Spam filtering is as follows:
English email in step 1, the above-mentioned SEWMTest Corpus2011 data sets of rejecting, chooses middle culture-stamp therein
Part.Since target is extraction Chinese rules, and there are English emails for SEWMTest Corpus2011 mails concentration, therefore first
Non- Chinese email is rejected from training set, according to being to decode mail, the Unicode codings of character is obtained, then judges postal
Part whether there is be more than or equal to u4e00 and less than or equal to u9fa5 character, if there are the ranges for mail head or mail body
Character, then it is assumed that be Chinese email, finally obtain and only retain such Chinese email set, obtain 4740 envelope spams and
5678 envelope normal emails.
All mails that the mail that step 2, decoding above-mentioned steps 1 obtain is concentrated.Keyword mainly from mail matter topics and
Mail body obtains, therefore is decoded to mail, and the Chinese for obtaining mail head and message body indicates.
The Chinese text production of step 3, the mail head and message body that are generated using Chinese Word Automatic Segmentation processing above-mentioned steps 2
Feature candidate word is given birth to, jieba Chinese word segmentations library is used in this example, jieba is a popular Chinese word segmentation library of increasing income, and is supported
Accurate model and search engine pattern, accurate model, which attempts to obtain, most accurately to be segmented, and search engine pattern is then with as far as possible
Ground subdivision sentence obtains fine-grained participle.The accurate model in the library is used in this example.
Step 4, the feature candidate word obtained for above-mentioned steps 3 count word frequency respectively, and document frequency is first counted according to above-mentioned
Calculate then feature extraction that word frequency calculates method (T% takes 70%, empirical value) the progress keyword of document frequency.From network
To the rule of CCERT include 332 mail head's rules and 154 mail bodies rules, and this example needs to carry out therewith pair
Than, therefore feature extraction selects 330 rule of mail head, 150 rule of mail body selection.
Step 5, all rules for removing SpamAssassin, and bayesian algorithm is disabled, eliminate the shadow of other rules
It rings.In order to avoid generating excessively high score value, it will be determined as that the threshold value of spam is set as 1.0.
The mail collection that above-mentioned steps 1 obtain is divided into training set and test set by step 6, and wherein training dataset accounts for Chinese
The 70% of mail collection, test set take remaining 30%.Mass-check scripts, which are provided, using SpamAssassin calls above-mentioned step
The rapid 4 keyword rules generated carry out rubbish filtering to training dataset, obtain these rules in spam and non-junk postal
The triggering situation of part.
Step 7, the regular triggering situation for obtaining above-mentioned steps 6 are as the input of neural network, by under stochastic gradient
The iteration of drop method obtains regular score.The difficult point of this step is how to confirm four parameters, and wherein ham_ is arranged in this example
Preference=2.0 (is left default value), and weight_decay=1.0 (is left default value), due in above-mentioned steps 5
Middle threshold value is opposite to be reduced, therefore learning_rate is reduced to 0.02 (acquiescence is 2.0, and it is empirical value, phase to take 0.02 here
It is preferable to effect).Num_epochs will determine the number of neural network iteration, be examined in above-mentioned perceived control without providing convergence
The logic for being automatically stopped operation is surveyed, but iterations are determined by user.In order to determine iterations, SpamAssassin's
It (is non-spam email or non-spam filtering that perceived control, which is added in realizing and prints corresponding misclassification per 10epoch,
Spam filtering spam) mail number logic, make discovery from observation in num_epochs=3000, the above method
Basic convergence, therefore it is 3000 that iterations, which are arranged,.
Above-mentioned steps 7 are trained the redundant rule elimination that obtained score value is 0 by step 8, the obtained final rule automatically generated
Then.
Step 9, the rule ultimately generated to above-mentioned steps 8 are assessed.
Wherein step 1- steps 4 purpose is to concentrate to choose keyword, flow chart such as Fig. 2 institutes of this part from mail data
Show, step 4- step 8 purposes are to determine that the score value of keyword rule, the flow chart of this part are as shown in Figure 3.
The result and use CCERT rules to above-mentioned SEWMTestCorpus2011 that final step 9 implements examples detailed above
The result that mail collection carries out filtrating mail is compared, and is described as follows to evaluation index:
Accidentally filterability hm%, definition are the ratios that non-spam email is misidentified as that spam accounts for total non-spam email
Example.
Filterability sm% is leaked, definition is the ratio that spam is misidentified as that non-spam email accounts for all spams
Example.
Logic averagely misses filterability lam%, and definition is the geometry of non-spam email and the mistake filtering ratio of spam
Meaning.
Metric<Hm%=0.1, sm%>, i.e. when hm%=0.1 corresponding sm% value.
Contrast test result is as follows:
From the above experiments, it was found that the regular effect that the Chinese rules collection ratio CCERT that this example generates is provided is wanted
Good, it is the same data set to be primarily due to training set and test set, has better correlation.This means that certainly according to user
The existing mail training set training rules effect of body can be more preferable, embodies the meaning of the method for the present invention, is worthy to be popularized.
Embodiment described above is only the preferred embodiments of the invention, and but not intended to limit the scope of the present invention, therefore
Change made by all shapes according to the present invention, principle, should all cover within the scope of the present invention.
Claims (5)
1. a kind of Chinese key rule generating method of Spam filtering, it is characterised in that:This method passes through to giving postal
Part data set carries out all vocabulary conducts that data prediction obtains the mail head of mail and mail body portion in the data set
Keyword candidate word by combining the feature extracting method of word frequency and document frequency to select keyword, and then uses the keyword
Rule carries out filtrating mail to obtain triggering of the keyword rule in spam and normal email to above-mentioned mail data collection
Situation, and use this triggering situation as the input of neural network algorithm, by training neural network straight with machine descending method
It is restrained to filter effect, converts the weight that training obtains to the score of rule, finally obtained rule can be applied to be based on
In the solution of the filtrating mail of rule;It specifically includes following steps:
1) mail data collection is pre-processed to obtain keyword candidate word set by mail screening, mail parsing, Chinese word segmentation
It closes;
2) to whole glossary statistic word frequency of candidate word set, document frequency, by comparing document frequency after first comparing word frequency
Feature extraction selects keyword from candidate word set;
3) it collects mail data and concentrates the keyword triggering situation for often sealing mail, and format triggering situation data;
4) score value is assigned to keyword rule by neural network algorithm according to above-mentioned keyword triggering situation.
2. a kind of Chinese key rule generating method of Spam filtering according to claim 1, it is characterised in that:
In step 1), the mail screening refers to the pure English file rejected mail data and concentrated, and the mail parsing is to realize base
Parsing is carried out to Mail Contents in RFC822 and MIME agreements and is partitioned into the part that mail head and mail body are chosen in different parts,
The Chinese word segmentation is segmented to the content of text of mail head and mail body using Chinese word segmentation tool.
3. a kind of Chinese key rule generating method of Spam filtering according to claim 1, it is characterised in that:
In step 2), Feature Selection is done in conjunction with the method for word frequency and document frequency, keyword is determined, includes the following steps:
2.1) word frequency, document frequencies are counted, word frequency refers to the number that a word occurs in a document, and document frequency points out some existing time
Select the document number of word;
2.2) the highest N number of word of word frequency in spam is chosen;
2.3) according to formula spam (wi)/spam (wi)+ham (wi)>T% filters out keyword, and the wi for meeting the formula is
One keyword, wherein wi indicate that some word in the highest N number of set of words of word frequency, spam (wi) indicate the rubbish for including word wi
Rubbish mail number, ham (wi) indicate that the normal email number for including word wi, T% indicate the threshold value of some setting.
4. a kind of Chinese key rule generating method of Spam filtering according to claim 1, it is characterised in that:
In step 3), the keyword triggering situation that mail data concentrates often envelope mail is collected using Open-Source Tools SpamAssassin,
And triggering situation data are formatted, include the following steps:
3.1) strictly all rules built in SpamAssassin are disabled, bayesian algorithm is deactivated, eliminates the influence of Else Rule, are added
The keyword rule generated in step 2);
3.2) the mass-check scripts that SpamAssassin is provided are used to call every part of mail in training set
SpamAssassin is filtered, and the strictly all rules that every envelope mail is triggered then are recorded in diary;
3.3) after-treatment is carried out to journal file, by processing structure structuring.
5. a kind of Chinese key rule generating method of Spam filtering according to claim 1, it is characterised in that:
In step 4), the use of neural network algorithm is that the keyword rule generated in step 2) assigns score value, includes the following steps:
4.1) non-spam email is subjected to redundancy duplication first, the formula for adding the number of non-spam email is 1+ (number_
Of_test_hit) * ham_preference, ham_preference input for parameter, are defaulted as 2.0, number_of_
Test_hit refers to the mail triggers how many rule;
4.2) it is that the weight in particular range is randomly assigned per rule, range is determined by the case where rule triggering mail number
It is fixed;
4.3) it is trained using neural network algorithm, num_epochs rear stopping of iteration, num_epochs refers to neural network
The number of iteration specifies weight_decay parameters and bias parameters, wherein the weight_decay parameters to be in each round iteration
Refer to the speed that weights are decayed in an iteration, bias parameters refer to deviation and are used for smoothly counting abnormal;
4.4) it deletes training and obtains the rule that score value is 0, the rule ultimately generated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810521174.0A CN108763449A (en) | 2018-05-28 | 2018-05-28 | A kind of Chinese key rule generating method of Spam filtering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810521174.0A CN108763449A (en) | 2018-05-28 | 2018-05-28 | A kind of Chinese key rule generating method of Spam filtering |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108763449A true CN108763449A (en) | 2018-11-06 |
Family
ID=64005925
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810521174.0A Pending CN108763449A (en) | 2018-05-28 | 2018-05-28 | A kind of Chinese key rule generating method of Spam filtering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108763449A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114866349A (en) * | 2022-07-06 | 2022-08-05 | 深圳市永达电子信息股份有限公司 | Network information filtering method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101136874A (en) * | 2007-07-25 | 2008-03-05 | 华南理工大学 | Compound decision based anti-rubbish E-mail error filtering method and system |
US20100153381A1 (en) * | 2000-05-12 | 2010-06-17 | Harris Technology, Llc | Automatic Mail Rejection Feature |
CN102214320A (en) * | 2010-04-12 | 2011-10-12 | 宋威 | Neural network training method and junk mail filtering method using same |
CN104270304A (en) * | 2014-10-14 | 2015-01-07 | 四川神琥科技有限公司 | Detection and analysis method for image emails |
-
2018
- 2018-05-28 CN CN201810521174.0A patent/CN108763449A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100153381A1 (en) * | 2000-05-12 | 2010-06-17 | Harris Technology, Llc | Automatic Mail Rejection Feature |
CN101136874A (en) * | 2007-07-25 | 2008-03-05 | 华南理工大学 | Compound decision based anti-rubbish E-mail error filtering method and system |
CN102214320A (en) * | 2010-04-12 | 2011-10-12 | 宋威 | Neural network training method and junk mail filtering method using same |
CN104270304A (en) * | 2014-10-14 | 2015-01-07 | 四川神琥科技有限公司 | Detection and analysis method for image emails |
Non-Patent Citations (1)
Title |
---|
黄康泉: "企业级邮件处理系统的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114866349A (en) * | 2022-07-06 | 2022-08-05 | 深圳市永达电子信息股份有限公司 | Network information filtering method |
CN114866349B (en) * | 2022-07-06 | 2022-11-15 | 深圳市永达电子信息股份有限公司 | Network information filtering method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gómez Hidalgo et al. | Content based SMS spam filtering | |
US7751620B1 (en) | Image spam filtering systems and methods | |
CN103441924B (en) | A kind of rubbish mail filtering method based on short text and device | |
CN103024746B (en) | System and method for processing spam short messages for telecommunication operator | |
CN105912576B (en) | Emotion classification method and system | |
CN103336766B (en) | Short text garbage identification and modeling method and device | |
Mohamad et al. | An evaluation on the efficiency of hybrid feature selection in spam email classification | |
CN105871887B (en) | Client-based individual electronic mail filtering system and filter method | |
CN107609121A (en) | Newsletter archive sorting technique based on LDA and word2vec algorithms | |
CN106528642A (en) | TF-IDF feature extraction based short text classification method | |
CN109446404A (en) | A kind of the feeling polarities analysis method and device of network public-opinion | |
CN101227435A (en) | Method for filtering Chinese junk mail based on Logistic regression | |
CN101784022A (en) | Method and system for filtering and classifying short messages | |
CN101540017B (en) | Feature extracting method based on byte level n-gram and twit filter | |
CN101408883A (en) | Method for collecting network public feelings viewpoint | |
CN108199951A (en) | A kind of rubbish mail filtering method based on more algorithm fusion models | |
CN104317891B (en) | A kind of method and device that label is marked to the page | |
CN101295381A (en) | Junk mail detecting method | |
Temma et al. | The document similarity index based on the Jaccard distance for mail filtering | |
CN104731772B (en) | Improved feature evaluation function based Bayesian spam filtering method | |
CN105323248B (en) | A kind of rule-based interactive Chinese Spam Filtering method | |
CN115622806B (en) | Network intrusion detection method based on BERT-CGAN | |
CN106649338B (en) | Information filtering strategy generation method and device | |
CN109558486A (en) | Electric power customer service client's demand intelligent identification Method | |
Iyengar et al. | Integrated spam detection for multilingual emails |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20181106 |
|
WD01 | Invention patent application deemed withdrawn after publication |