CN102880952A - Method for collecting and classifying E-mails - Google Patents

Method for collecting and classifying E-mails Download PDF

Info

Publication number
CN102880952A
CN102880952A CN2012103276245A CN201210327624A CN102880952A CN 102880952 A CN102880952 A CN 102880952A CN 2012103276245 A CN2012103276245 A CN 2012103276245A CN 201210327624 A CN201210327624 A CN 201210327624A CN 102880952 A CN102880952 A CN 102880952A
Authority
CN
China
Prior art keywords
confidence
degree
mails
informer
email
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012103276245A
Other languages
Chinese (zh)
Inventor
林延中
潘庆峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
MAIMAILTECH (BEIJING) CO Ltd
Original Assignee
MAIMAILTECH (BEIJING) CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by MAIMAILTECH (BEIJING) CO Ltd filed Critical MAIMAILTECH (BEIJING) CO Ltd
Priority to CN2012103276245A priority Critical patent/CN102880952A/en
Priority to PCT/CN2012/085097 priority patent/WO2014036788A1/en
Publication of CN102880952A publication Critical patent/CN102880952A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/107Computer-aided management of electronic mailing [e-mailing]

Abstract

The invention discloses a method for collecting and classifying E-mails, which comprises the following steps: scanning all reported E-mails in a server, extracting target E-mails with the report time more than or equal to n, wherein n is a default value and the reported E-mails comprise E-mails reported to be both normal E-mails and junk E-mails; computing confidence coefficients of the target E-mails and figuring out the results; and judging the target E-mails as the junk-E-mails or the normal E-mails according to the computed results and storing the judged target E-mails in a database. According to the invention, no person needs to be specially assigned to classify and label a great quantity of E-mails, but a computer is directly applied to collect feedback information of users, so that the manual workload is reduced, the accuracy rate of classification is guaranteed, and the E-mails are not read artificially, so as to protect the privacy of the users.

Description

A kind of Email is collected sorting technique
Technical field
The present invention relates to communication technical field, relate in particular to a kind of Email and collect sorting technique.
Background technology
At present, what carry out that text classification uses is the artificial intelligence sorting algorithm, and these algorithms need learn learning sample first, construct corresponding discrimination model after, just can carry out text classification; Therefore, need obtain first learning sample, the method for obtaining at present learning sample is manually directly a collection of sampling to be marked, and the mark mail is spam or non-spam.
Because sorting algorithm need to have enough learning information amounts, at least need several ten thousand envelope learning samples are learnt just can construct a reliable model, therefore, need to arrange the special messenger that several ten thousand envelope mails are carried out classification annotation, its workload is huge, and manually carries out for a long time this class repeated work, easily makes a fault, cause the sample error rate to increase, affect the final results of learning of sorting algorithm; In addition, when mail is carried out classification annotation, need manual read's user mail, invaded user's privacy.
Summary of the invention
Embodiment of the invention technical matters to be solved is; provide a kind of Email to collect sorting technique; the method need not to arrange the special messenger that a large amount of mails are carried out classification annotation; but directly utilize computing machine to collect user's feedback information; alleviated artificial workload; guarantee the accuracy rate of classification, also need not manually mail to be read simultaneously, protected user's privacy.
In order to solve the problems of the technologies described above, the embodiment of the invention provides a kind of Email to collect sorting technique, comprise: all mails of being reported in the scanning server, extract by the targeted mails of report number of times more than or equal to n, n is default value, and it is that normal email and quilt report are the mail of spam that the described mail of being reported comprises by report; Calculate the degree of confidence of described targeted mails, draw result of calculation; Judge that according to described result of calculation described targeted mails is spam or normal email, and store in the database.
As the improvement of such scheme, the step of the degree of confidence of the described targeted mails of described calculating comprises: with the degree of confidence addition of institute's handlebar targeted mails report for the informer of normal email, draw total normal email degree of confidence X; With the degree of confidence addition of institute's handlebar targeted mails report for the informer of spam, draw total spam confidence Y; Calculating the absolute value of the difference of total normal email degree of confidence X and total spam confidence Y | X-Y| draws result of calculation.
Improvement as such scheme, describedly judging that according to described result of calculation described targeted mails comprises as the step of spam or normal email: with the absolute value of described total normal email degree of confidence X with the difference of total spam confidence Y | X-Y| and threshold value T compare, judge | whether X-Y| is less than T, be judged as when being, temporarily this mail is not judged, be judged as when no, the size that compares X and Y, as X during greater than Y, judge that mail is normal email, as X during less than Y, judge that mail is spam.
As the improvement of such scheme, also comprised before the step of the degree of confidence of the described targeted mails of described calculating: the initial degree of confidence that will report for the first time the informer of mail is preset as 1.
As the improvement of such scheme, described Email is collected sorting technique and is also comprised: upgrade informer's degree of confidence, increase the informer's consistent with the final decision result degree of confidence, reduce the degree of confidence with the inconsistent informer of final decision result.
As the improvement of such scheme, gathering way of described degree of confidence is slower than underspeeding.
As the improvement of such scheme, described degree of confidence is provided with maximal value and minimum value, and described degree of confidence rises to after the maximal value just no longer to be increased, and just no longer reduces after dropping to minimum value.
Implementing beneficial effect of the present invention is: by all mails of being reported in the computer scanning server, extract by the targeted mails of report number of times more than or equal to system default value, based on degree of confidence targeted mails is carried out confidence calculations, then judge that according to result of calculation the mail of being reported is spam or normal email, and collect in the corresponding database; This process is field feedback directly to be processed in degree of confidence by computer based, has alleviated artificial working strength and workload, has guaranteed the accuracy rate of classification, and need not manually mail to be read, and has protected user's privacy.
Description of drawings
Fig. 1 is that a kind of Email of the present invention is collected the first embodiment flowage structure schematic diagram of sorting technique;
Fig. 2 is that a kind of Email of the present invention is collected the second embodiment flowage structure schematic diagram of sorting technique;
Fig. 3 is that a kind of Email of the present invention is collected the 3rd embodiment flowage structure schematic diagram of sorting technique;
Fig. 4 is that a kind of Email of the present invention is collected the 4th embodiment flowage structure schematic diagram of sorting technique.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, the present invention is described in further detail below in conjunction with accompanying drawing.
Fig. 1 is that a kind of Email of the present invention is collected the first embodiment flowage structure schematic diagram of sorting technique, comprising:
S100, all mails of being reported in the scanning server extract by the targeted mails of report number of times more than or equal to n.
N is default value, and it is that normal email and quilt report are the mail of spam that the described mail of being reported comprises by report.
Need to prove, be automatically all mails of being reported in the server to be scanned by computing machine, and computing machine at regular intervals will be to the server run-down; Tolerant value n can arrange as the case may be, and preferably, tolerant value n is 3.
S101 calculates the degree of confidence of described targeted mails, draws result of calculation.
S102 judges that according to described result of calculation described targeted mails is spam or normal email, and stores in the database.
Need to prove, result of determination is storing in the spam database of spam, and result of determination is storing in the normal email database of normal email.
Fig. 2 is that a kind of Email of the present invention is collected the second embodiment flowage structure schematic diagram of sorting technique, comprising:
S200, all mails of being reported in the scanning server extract by the targeted mails of report number of times more than or equal to n.
N is default value, and it is that normal email and quilt report are the mail of spam that the described mail of being reported comprises by report.
Need to prove, be automatically all mails of being reported in the server to be scanned by computing machine, and computing machine at regular intervals will be to the server run-down; Tolerant value n can arrange as the case may be, and preferably, tolerant value n is 3.
S201 with the degree of confidence addition of institute's handlebar targeted mails report for the informer of normal email, draws total normal email degree of confidence X.
S202 with the degree of confidence addition of institute's handlebar targeted mails report for the informer of spam, draws total spam confidence Y.
Need to prove, step S201 and S202 do not have sequencing, can carry out simultaneously.
S203, calculate the absolute value of total normal email degree of confidence X and the difference of total spam confidence Y | X-Y| draws result of calculation.
S204 judges that according to described result of calculation described targeted mails is spam or normal email, and stores in the database.
Need to prove, result of determination is storing in the spam database of spam, and result of determination is storing in the normal email database of normal email.
For example, the M mail has been reported 4 times through scanning discovery, default greater than default value 3(), therefore be extracted as targeted mails, wherein informer A and B are normal email with the report of M mail, and informer C and D are spam with the report of M mail, the degree of confidence of informer A is 5, the degree of confidence of informer B is 10, and the degree of confidence of informer C is 3, and the degree of confidence of informer D is 8; Then total normal email degree of confidence X is 5+10=15, and total spam confidence Y is 3+8=11, the absolute value of total normal email degree of confidence X and the difference of total spam confidence Y | X-Y| is | and 15-11|=4.
Fig. 3 is that a kind of Email of the present invention is collected the 3rd embodiment flowage structure schematic diagram of sorting technique, comprising:
S300, all mails of being reported in the scanning server extract by the targeted mails of report number of times more than or equal to n.
N is default value, and it is that normal email and quilt report are the mail of spam that the described mail of being reported comprises by report.
Need to prove, be automatically all mails of being reported in the server to be scanned by computing machine, and computing machine at regular intervals will be to the server run-down; Tolerant value n can arrange as the case may be, and preferably, tolerant value n is 3.
S301 with the degree of confidence addition of institute's handlebar targeted mails report for the informer of normal email, draws total normal email degree of confidence X.
S302 with the degree of confidence addition of institute's handlebar targeted mails report for the informer of spam, draws total spam confidence Y.
Need to prove, step S301 and S302 do not have sequencing, can carry out simultaneously.
S303, calculate the absolute value of total normal email degree of confidence X and the difference of total spam confidence Y | X-Y| draws result of calculation.
S304, with the absolute value of described total normal email degree of confidence X with the difference of total spam confidence Y | X-Y| and threshold value T compare, and judge | and whether X-Y| is less than T.
Need to prove, threshold value T can preset as the case may be, and threshold value T will be higher than initial degree of confidence usually, and preferably threshold value T is 3.
Be judged as when being, temporarily this mail do not judged.
Need to prove, to the targeted mails of temporarily not judging, it is continued in the temporary server, stay and give the follow up scan judgement.
Be judged as when no, the size of X and Y relatively as X during greater than Y, judges that mail is normal email, as X during less than Y, judges that mail is spam.
Need to prove, result of determination is storing in the spam database of spam, and result of determination is storing in the normal email database of normal email.
For example, the m mail has been reported 4 times through scanning discovery, default greater than default value 3(), therefore be extracted as targeted mails, wherein informer a and b are normal email with the report of m mail, and informer c and d are spam with the report of m mail, the degree of confidence of informer a is 5, the degree of confidence of informer b is 10, and the degree of confidence of informer c is 5, and the degree of confidence of informer d is 8; Then total normal email degree of confidence X is 5+10=15, total spam confidence Y is 3+8=13, the absolute value of total normal email degree of confidence X and the difference of total spam confidence Y | X-Y| is | 15-13|=2, and threshold value T is preset as 3, then | X-Y|<T, therefore temporarily this m mail is not judged, this m mail is continued in the temporary server, stay and give the follow up scan judgement.
And for example, the M mail has been reported 4 times through scanning discovery, greater than default value 3, therefore be extracted as targeted mails, wherein informer A and B are normal email with the report of M mail, and informer C and D are spam with the report of M mail, if the degree of confidence of informer A is 5, the degree of confidence of informer B is 10, and the degree of confidence of informer C is 3, and the degree of confidence of informer D is 8; Then total normal email degree of confidence X is 5+10=15, total spam confidence Y is 3+8=11, the absolute value of total normal email degree of confidence X and the difference of total spam confidence Y | X-Y| is | 15-11|=4, and threshold value T is preset as 3, and then | X-Y|〉T, therefore need to compare the size of X and Y, X=15 again, Y=11, X〉Y, judge that then the M mail is normal email, and the M mail is collected in the normal email database.
If the degree of confidence of informer A is 3, the degree of confidence of informer B is 8, and the degree of confidence of informer C is 5, and the degree of confidence of informer D is 10; Then total normal email degree of confidence X is 3+8=11, total spam confidence Y is 5+10=15, the absolute value of total normal email degree of confidence X and the difference of total spam confidence Y | X-Y| is | 11-15|=4, and threshold value T is preset as 3, and then | X-Y|〉T, therefore need to compare the size of X and Y, X=11 again, Y=15, X<Y, judge that then the M mail is spam, and the M mail is collected in the spam database.
Fig. 4 is that a kind of Email of the present invention is collected the 4th embodiment flowage structure schematic diagram of sorting technique, comprising:
S400, all mails of being reported in the scanning server extract by the targeted mails of report number of times more than or equal to n.
N is default value, and it is that normal email and quilt report are the mail of spam that the described mail of being reported comprises by report.
Need to prove, be automatically all mails of being reported in the server to be scanned by computing machine, and computing machine at regular intervals will be to the server run-down; Tolerant value n can arrange as the case may be, and preferably, tolerant value n is 3.
S401 is preset as 1 with the initial degree of confidence of reporting for the first time the informer of mail.
S402 with the degree of confidence addition of institute's handlebar targeted mails report for the informer of normal email, draws total normal email degree of confidence X.
Need to prove, step S401 and S402 do not have sequencing, can carry out simultaneously.
S403 with the degree of confidence addition of institute's handlebar targeted mails report for the informer of spam, draws total spam confidence Y.
S404, calculate the absolute value of total normal email degree of confidence X and the difference of total spam confidence Y | X-Y| draws result of calculation.
S405, with the absolute value of described total normal email degree of confidence X with the difference of total spam confidence Y | X-Y| and threshold value T compare, and judge | and whether X-Y| is less than T.
Need to prove, threshold value T can preset as the case may be, and threshold value T will be higher than initial degree of confidence usually, and preferably threshold value T is 3.
Be judged as when being, temporarily this mail do not judged.
Need to prove, to the targeted mails of temporarily not judging, it is continued in the temporary server, stay and give the follow up scan judgement.
Be judged as when no, the size of X and Y relatively as X during greater than Y, judges that mail is normal email, as X during less than Y, judges that mail is spam.
Need to prove, result of determination is storing in the spam database of spam, and result of determination is storing in the normal email database of normal email.
S406 upgrades informer's degree of confidence, increases the informer's consistent with the final decision result degree of confidence, reduction and the inconsistent informer's of final decision result degree of confidence.
Need to prove, the increase of degree of confidence and reduction amplitude can be preset as required, and preferably, the increasing degree of degree of confidence is+1; The reduction amplitude of degree of confidence is got the amplitude the greater among both for descending 10% or-1.
More preferably, described degree of confidence gather way slower than underspeeding.
Need to prove, gathering way of degree of confidence is slower than underspeeding, and can guarantee that the informer who has high confidence level has more confidence level, and it is professional stronger, thereby guarantees that final decision is more accurate.
More preferably, described degree of confidence is provided with maximal value and minimum value, and described degree of confidence rises to after the maximal value just no longer to be increased, and just no longer reduces after dropping to minimum value.
Need to prove, maximal value or minimum value can be preset as required, and preferably, maximal value is 50, and minimum value is 0.
For example, the m mail has been reported 4 times through scanning discovery, default greater than default value 3(), therefore be extracted as targeted mails, wherein informer a and b are normal email with the report of m mail, and informer c and d are spam with the report of m mail, the degree of confidence of informer a is 5, the degree of confidence of informer b is 10, and the degree of confidence of informer c is 5, and the degree of confidence of informer d is 8; Then total normal email degree of confidence X is 5+10=15, total spam confidence Y is 3+8=13, the absolute value of total normal email degree of confidence X and the difference of total spam confidence Y | X-Y| is | 15-13|=2, and threshold value T is preset as 3, then | X-Y|<T, therefore temporarily this m mail is not judged, this m mail is continued in the temporary server, stay and give the follow up scan judgement.
And for example, the M mail has been reported 4 times through scanning discovery, greater than default value 3, therefore be extracted as targeted mails, wherein informer A and B are normal email with the report of M mail, informer C and D are spam with the report of M mail, if informer A is first report, giving the initial degree of confidence of informer A is 1, and the degree of confidence of informer B is 14, the degree of confidence of informer C is 3, and the degree of confidence of informer D is 8; Then total normal email degree of confidence X is 1+14=15, total spam confidence Y is 3+8=11, the absolute value of total normal email degree of confidence X and the difference of total spam confidence Y | X-Y| is | 15-11|=4, and threshold value T is preset as 3, then | X-Y|〉T, therefore need compare the size of X and Y, again X=15, Y=11, X〉Y, judge that then the M mail is normal email, and the M mail collected in the normal email database, simultaneously, upgrade informer's degree of confidence, informer A is consistent with result of determination with B, so the degree of confidence of informer A and B+1, the degree of confidence of informer A becomes 2, and the degree of confidence of informer B becomes 15; Informer C and D and result of determination are inconsistent, therefore the degree of confidence of informer C and D descends 10% or-1, the original degree of confidence of informer C is 3, descend 10% less than-1 amplitude, then the degree of confidence of informer C is 2 after descending, the original degree of confidence of informer D is 8, descends 10% less than-1 amplitude, and then the degree of confidence of informer D is 7 after descending.
If the degree of confidence of informer A is 3, the degree of confidence of informer B is 15, and the degree of confidence of informer C is 5, and the degree of confidence of informer D is 20; Then total normal email degree of confidence X is 3+15=18, total spam confidence Y is 5+20=25, the absolute value of total normal email degree of confidence X and the difference of total spam confidence Y | X-Y| is | 18-25|=7, and threshold value T is preset as 3, then | X-Y|〉T, therefore need compare the size of X and Y, again X=18, Y=25, X<Y judges that then the M mail is spam, and the M mail is collected in the spam database, simultaneously, upgrade informer's degree of confidence, informer C is consistent with result of determination with D, so the degree of confidence of informer C and D+1, the degree of confidence of informer C becomes 6, and the degree of confidence of informer D becomes 21; Informer A and B and result of determination are inconsistent, therefore the degree of confidence of informer A and B descends 10% or-1, the original degree of confidence of informer A is 3, descend 10% less than-1 amplitude, be 2 after then the degree of confidence of informer A descends, the original degree of confidence of informer B is 15, descends 10% greater than-1 amplitude, then the degree of confidence of informer B descends 1.5, becomes 13.5.
As from the foregoing, by all mails of being reported in the computer scanning server, extract by the targeted mails of report number of times more than or equal to system default value, based on degree of confidence targeted mails is carried out confidence calculations, then judge that according to result of calculation the mail of being reported is spam or normal email, and collect in the corresponding database; This process is field feedback directly to be processed in degree of confidence by computer based, has alleviated artificial working strength and workload, has guaranteed the accuracy rate of classification, and need not manually mail to be read, and has protected user's privacy.
The above is preferred implementation of the present invention; should be pointed out that for those skilled in the art, under the prerequisite that does not break away from the principle of the invention; can also make some improvements and modifications, these improvements and modifications also are considered as protection scope of the present invention.

Claims (7)

1. an Email is collected sorting technique, it is characterized in that, comprising:
All mails of being reported in the scanning server, extraction is by the targeted mails of report number of times more than or equal to n, and n is default value, and it is that normal email and quilt report are the mail of spam that the described mail of being reported comprises by report;
Calculate the degree of confidence of described targeted mails, draw result of calculation;
Judge that according to described result of calculation described targeted mails is spam or normal email, and store in the database.
2. Email as claimed in claim 1 is collected sorting technique, it is characterized in that, the step of the degree of confidence of the described targeted mails of described calculating comprises:
With the degree of confidence addition of institute's handlebar targeted mails report for the informer of normal email, draw total normal email degree of confidence X;
With the degree of confidence addition of institute's handlebar targeted mails report for the informer of spam, draw total spam confidence Y;
Calculating the absolute value of the difference of total normal email degree of confidence X and total spam confidence Y | X-Y| draws result of calculation.
3. Email as claimed in claim 2 is collected sorting technique, it is characterized in that, describedly judges that according to described result of calculation described targeted mails comprises as the step of spam or normal email:
With the absolute value of described total normal email degree of confidence X with the difference of total spam confidence Y | X-Y| and threshold value T compare, judgement | and whether X-Y| less than T,
Be judged as when being, temporarily this mail do not judged,
Be judged as when no, the size of X and Y relatively as X during greater than Y, judges that mail is normal email, as X during less than Y, judges that mail is spam.
4. Email as claimed in claim 2 is collected sorting technique, it is characterized in that, also comprises before the step of the degree of confidence of the described targeted mails of described calculating:
The initial degree of confidence of reporting for the first time the informer of mail is preset as 1.
5. Email as claimed in claim 1 is collected sorting technique, it is characterized in that, also comprises:
Upgrade informer's degree of confidence, increase the informer's consistent with the final decision result degree of confidence, reduce the degree of confidence with the inconsistent informer of final decision result.
6. Email as claimed in claim 5 is collected sorting technique, it is characterized in that, gathering way of described degree of confidence is slower than underspeeding.
7. Email as claimed in claim 5 is collected sorting technique, it is characterized in that, described degree of confidence is provided with maximal value and minimum value, and described degree of confidence just no longer increases after rising to maximal value, just no longer reduces after dropping to minimum value.
CN2012103276245A 2012-09-07 2012-09-07 Method for collecting and classifying E-mails Pending CN102880952A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN2012103276245A CN102880952A (en) 2012-09-07 2012-09-07 Method for collecting and classifying E-mails
PCT/CN2012/085097 WO2014036788A1 (en) 2012-09-07 2012-11-23 A method for collecting and classification email

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012103276245A CN102880952A (en) 2012-09-07 2012-09-07 Method for collecting and classifying E-mails

Publications (1)

Publication Number Publication Date
CN102880952A true CN102880952A (en) 2013-01-16

Family

ID=47482268

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012103276245A Pending CN102880952A (en) 2012-09-07 2012-09-07 Method for collecting and classifying E-mails

Country Status (2)

Country Link
CN (1) CN102880952A (en)
WO (1) WO2014036788A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970832A (en) * 2014-04-01 2014-08-06 百度在线网络技术(北京)有限公司 Method and device for recognizing spam
CN104424280A (en) * 2013-08-30 2015-03-18 格博信息技术(苏州)有限公司 Push follow-up method and system thereof

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103984703B (en) * 2014-04-22 2017-04-12 新浪网技术(中国)有限公司 Mail classification method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040139160A1 (en) * 2003-01-09 2004-07-15 Microsoft Corporation Framework to enable integration of anti-spam technologies
CN1719812A (en) * 2005-08-08 2006-01-11 北京中星微电子有限公司 Method and system for filtering refuse E-mail
CN101674264A (en) * 2009-10-20 2010-03-17 哈尔滨工程大学 Spam detection device and method based on user relationship mining and credit evaluation

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6161130A (en) * 1998-06-23 2000-12-12 Microsoft Corporation Technique which utilizes a probabilistic classifier to detect "junk" e-mail by automatically updating a training and re-training the classifier based on the updated training set
KR20050002320A (en) * 2003-06-30 2005-01-07 신동준 E-mail managing system and method thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040139160A1 (en) * 2003-01-09 2004-07-15 Microsoft Corporation Framework to enable integration of anti-spam technologies
CN1719812A (en) * 2005-08-08 2006-01-11 北京中星微电子有限公司 Method and system for filtering refuse E-mail
CN101674264A (en) * 2009-10-20 2010-03-17 哈尔滨工程大学 Spam detection device and method based on user relationship mining and credit evaluation

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424280A (en) * 2013-08-30 2015-03-18 格博信息技术(苏州)有限公司 Push follow-up method and system thereof
CN104424280B (en) * 2013-08-30 2018-10-23 格博信息技术(苏州)有限公司 Push follow-up method and its system
CN103970832A (en) * 2014-04-01 2014-08-06 百度在线网络技术(北京)有限公司 Method and device for recognizing spam

Also Published As

Publication number Publication date
WO2014036788A1 (en) 2014-03-13

Similar Documents

Publication Publication Date Title
CN104040963B (en) The system and method for carrying out spam detection for the frequency spectrum using character string
CN104067567B (en) System and method for carrying out spam detection using character histogram
CN101674264B (en) Spam detection device and method based on user relationship mining and credit evaluation
CN107169001A (en) A kind of textual classification model optimization method based on mass-rent feedback and Active Learning
US20120136812A1 (en) Method and system for machine-learning based optimization and customization of document similarities calculation
CN104050556B (en) The feature selection approach and its detection method of a kind of spam
CN109102028A (en) Based on improved fast density peak value cluster and LOF outlier detection algorithm
CN103024746A (en) System and method for processing spam short messages for telecommunication operator
CN113518063B (en) Network intrusion detection method and system based on data enhancement and BilSTM
CN109359137B (en) User growth portrait construction method based on feature screening and semi-supervised learning
CN111798312A (en) Financial transaction system abnormity identification method based on isolated forest algorithm
CN107688742B (en) Large-scale rapid mobile application APP detection and analysis method
TW200949570A (en) Method for filtering e-mail and mail filtering system thereof
CN102956023A (en) Bayes classification-based method for fusing traditional meteorological data with perception data
CN108133393A (en) Data processing method and system
CN113362299B (en) X-ray security inspection image detection method based on improved YOLOv4
CN102880952A (en) Method for collecting and classifying E-mails
CN106156105A (en) Email polymerization sorting technique and device
CN110213152A (en) Identify method, apparatus, server and the storage medium of spam
CN106644035B (en) Vibration source identification method and system based on time-frequency transformation characteristics
CN109166025A (en) A kind of checking method and relevant apparatus
CN104376304B (en) A kind of recognition methods of text advertisements image and device
CN103595614A (en) User feedback based junk mail detection method
CN107992508A (en) A kind of Chinese email signature extracting method and system based on machine learning
CN111209158B (en) Mining monitoring method and cluster monitoring system for server cluster

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20130116