CN101764765A - Spam mail filtering method based on user interest - Google Patents

Spam mail filtering method based on user interest Download PDF

Info

Publication number
CN101764765A
CN101764765A CN200910242936A CN200910242936A CN101764765A CN 101764765 A CN101764765 A CN 101764765A CN 200910242936 A CN200910242936 A CN 200910242936A CN 200910242936 A CN200910242936 A CN 200910242936A CN 101764765 A CN101764765 A CN 101764765A
Authority
CN
China
Prior art keywords
mail
user
user interest
interest
method based
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200910242936A
Other languages
Chinese (zh)
Inventor
谭营
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN200910242936A priority Critical patent/CN101764765A/en
Publication of CN101764765A publication Critical patent/CN101764765A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a spam mail filtering method based on user interest. The method comprises the following steps: resolving a mail of each user after each user receives the mail to obtain mail title, mail text, receiver address and sender address; segmenting words of the mail title and mail text, generating a feature vector according to segmented mail title and mail text and a detector, and training on the training set of each user to generate a sorter model for each user; sorting the mails according to the sorter model corresponding to each user when receiving new mails; retraining the sorter model for each user with the mails when detecting that user interest changes set and shown through spam mail definition by users. The invention is designed to the overall performance of spam mail detection, effectively detect change of spam mail, retrain sorter models of user when detecting that user interest changes, and be adaptive to change of user demand or interest.

Description

Rubbish mail filtering method based on user interest
Technical field
The present invention relates to network safety filed, relate in particular to a kind of rubbish mail filtering method based on user interest.
Background technology
In view of the serious social concern that spam caused, in recent years, the anti-rubbish mail strategy has received unprecedented concern.Many scholars concentrate on research focus the detection and the filtration of the spam of automation, have proposed many methods, and as blacklist, machine learning (comprises
Figure G2009102429364D00011
Bayes, Support Vector Machine, Neural Network, Boosting Trees etc.).
For the Spam filtering service is provided to the user, mail service provider is applied in the server rank with these methods of filtering spam and deals with the work accordingly, yet their effect is but also not fully up to expectations.One of them topmost problem is that existing spam detection server disposition is not distinguished the interest of different user, can't preserve independently operational factor and configuration separately for each user, more can't adapt to the variation of user interest.
Existing server deploy spam detection technology is preserved unified spam detection parameter for all users, and consistent model is provided.Yet this implementation can't satisfy the situation of user interest difference (to the different definition of spam and normal email) and user interest variation.
On the one hand, existing mail server realizes that technology can't satisfy the different user's request of user.In actual life, user's interest also is not quite similar.For example: concerning the same mail that comprises recruitment information, user's first can assert that it is a normal email, because he looks for a job.User's second then can be owing to not needing these information to assert that it is a spam.In this case, if provide unified parameter setting and unified detection model to users all on the server, server will inevitably provide wrong spam detection information to the certain user so.If server judges that by detecting that mail is a spam, error detection information is provided then can for user's first, this normal email of user's first is used as serviced device as Spam filtering; Otherwise server provides error message will for user's second, can not filter this mail effectively for user's second.
On the other hand, existing mail server realizes that technology can not adapt to the variation of user interest.Because prior art is carried out unified detected parameters setting to all users, so when the interest (to the definition of spam) of some mail user when changing, server can not be adjusted according to these users' interest, otherwise will bring negative effect (because their interest does not change, so the adjustment of parameter can cause detecting performance decrease on the contrary) to other users.
Summary of the invention
The present invention is directed to the deficiencies in the prior art, a kind of new rubbish mail filtering method based on user interest is provided, this method is by preserving independent parameter setting separately (corresponding separately independently sorter model) for each user, thereby according to the difference of user interest, for they produce corresponding classification of mail result.And this scheme can detect the variation of each user interest, and adjusts each respective classified device model in time according to changing.When the interest of having only the certain user changes, this scheme will be adjusted the corresponding model of these users, thereby carry out retraining and reclassify.
For achieving the above object, the invention provides a kind of rubbish mail filtering method based on user interest, may further comprise the steps:
S1 after each user gets the mail, resolves respectively the mail at standby family, obtains title, text and addressee and the sender address of mail, and wherein address of the addressee is used for selecting and determining its corresponding detectors collection and sorter model;
S2, the title and the text of mail are carried out participle, according to the title of the mail behind the participle and text, detector collection generating feature vector, by on each user training set separately, training, for each user generates separately independently sorter model, when receiving new mail, mail is classified according to each user's respective classified device model, when detecting user interest and change, sorter model to relative users carries out retraining with mail, and described user interest embodies the definition setting of spam by the user.
Described sorter model is support vector unit (being a plurality of SVMs).
When user interest changes, adopt the sliding window of forming by a plurality of SVMs that detector collection and characteristic vector are upgraded.
The step of among the described step S2 mail being carried out retraining is specially: adopt the sliding window of being made up of a plurality of SVMs, and new mail carries out retraining to user's sorter model, and adjusts corresponding support vector unit of each user and detector collection successively.
The step of among the described step S2 mail being classified is specially: with the corresponding support vector unit of user that described characteristic vector input is determined by address of the addressee, the classification results that returns promptly determines the classification of this mail.
Technique scheme has following advantage: by each user being provided with different classifier parameters, preserving different sorter models, can improve the overall performance of spam detection; By using the sliding window of forming by a plurality of SVMs can also detect the variation of user interest effectively, and after the variation that detects user interest, the sorter model of relative users is carried out retraining, with the variation of adaptive user demand.
Description of drawings
Fig. 1 is the flow chart of the method for the embodiment of the invention.
Embodiment
Below in conjunction with drawings and Examples, the specific embodiment of the present invention is described in further detail.Following examples are used to illustrate the present invention, but are not used for limiting the scope of the invention.
As shown in Figure 1, the filter method according to a kind of spam based on user interest of the embodiment of the invention may further comprise the steps:
S1 after each user gets the mail, resolves respectively each user's mail, obtains title, text and addressee and the sender address of mail, and wherein address of the addressee is used for selecting and determining its corresponding detectors collection and sorter model;
S2, the title and the text of mail are carried out participle, according to the title of the mail behind the participle and text, detector collection generating feature vector, by on each user training set separately, training, for each user generates separately independently sorter model, when receiving new mail, mail is classified according to each user's respective classified device model, when detecting user interest and change, sorter model to relative users carries out retraining to mail, and described user interest embodies the definition setting of spam by the user.
Described sorter model is the support vector unit.
When user interest changes, use the sliding window of forming by a plurality of SVMs that detector collection and characteristic vector are upgraded.
The step of among the described step S2 mail being carried out retraining is specially: adopt the sliding window of being made up of a plurality of SVMs, and new mail carries out retraining to user's sorter model, and adjusts the corresponding support vector unit of each user successively and the detector collection is finished.
The step of among the described step S2 mail being classified is specially: with the pairing support vector unit of described characteristic vector input by the definite user of address of the addressee, the classification results that returns promptly determines the classification of this mail.
The above only is a preferred implementation of the present invention; should be pointed out that for those skilled in the art, under the prerequisite that does not break away from the technology of the present invention principle; can also make some improvement and modification, these improve and modification also should be considered as protection scope of the present invention.

Claims (5)

1. the rubbish mail filtering method based on user interest is characterized in that, may further comprise the steps:
S1 after each user gets the mail, resolves respectively each user's mail, obtains title, text and addressee and the sender address of mail, and wherein address of the addressee is used for selecting and determining its corresponding detectors collection and sorter model;
S2, the title and the text of mail are carried out participle, title and text and detector collection generating feature vector according to the mail behind the participle, by on each user training set separately, training, for each user generates separately independently sorter model, when receiving new mail, mail is classified according to each user's respective classified device model, when detecting user interest and change, sorter model to relative users carries out retraining with mail, and described user interest embodies the definition setting of spam by the user.
2. the rubbish mail filtering method based on user interest as claimed in claim 1 is characterized in that, described sorter model is the support vector unit.
3. the rubbish mail filtering method based on user interest as claimed in claim 2 is characterized in that, when user interest changes, uses the sliding window of being made up of a plurality of SVMs that detector collection and characteristic vector are upgraded.
4. the rubbish mail filtering method based on user interest as claimed in claim 2, it is characterized in that, the step of among the described step S2 mail being carried out retraining is specially: according to the sliding window of being made up of a plurality of SVMs, adjust corresponding support vector unit of each user and detector collection successively and finish.
5. the rubbish mail filtering method based on user interest as claimed in claim 2, it is characterized in that, the step of among the described step S2 mail being classified is specially: with the corresponding support vector unit of described characteristic vector input by the definite user of address of the addressee, the classification results that returns promptly determines the classification of this mail.
CN200910242936A 2009-12-21 2009-12-21 Spam mail filtering method based on user interest Pending CN101764765A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200910242936A CN101764765A (en) 2009-12-21 2009-12-21 Spam mail filtering method based on user interest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200910242936A CN101764765A (en) 2009-12-21 2009-12-21 Spam mail filtering method based on user interest

Publications (1)

Publication Number Publication Date
CN101764765A true CN101764765A (en) 2010-06-30

Family

ID=42495757

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200910242936A Pending CN101764765A (en) 2009-12-21 2009-12-21 Spam mail filtering method based on user interest

Country Status (1)

Country Link
CN (1) CN101764765A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793838A (en) * 2014-01-26 2014-05-14 宇龙计算机通信科技(深圳)有限公司 Advertisement intercepting method and device
CN104794176A (en) * 2015-04-02 2015-07-22 中国科学院信息工程研究所 Multiattribute-based detection method for missent e-mail
CN106230690A (en) * 2016-07-25 2016-12-14 华中科技大学 The process for sorting mailings of a kind of combination user property and system
CN108073718A (en) * 2017-12-29 2018-05-25 长春理工大学 A kind of mail two classification algorithm based on Active Learning and Negative Selection
CN108347421A (en) * 2017-03-31 2018-07-31 北京安天网络安全技术有限公司 A kind of malious email detection method and system based on content
CN109831373A (en) * 2019-03-01 2019-05-31 论客科技(广州)有限公司 The anti-erroneous judgement method and device of mailing system high-precision intelligent based on FastText algorithm
CN110213152A (en) * 2018-05-02 2019-09-06 腾讯科技(深圳)有限公司 Identify method, apparatus, server and the storage medium of spam

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793838A (en) * 2014-01-26 2014-05-14 宇龙计算机通信科技(深圳)有限公司 Advertisement intercepting method and device
CN104794176A (en) * 2015-04-02 2015-07-22 中国科学院信息工程研究所 Multiattribute-based detection method for missent e-mail
CN106230690A (en) * 2016-07-25 2016-12-14 华中科技大学 The process for sorting mailings of a kind of combination user property and system
CN106230690B (en) * 2016-07-25 2019-06-11 华中科技大学 A kind of process for sorting mailings and system of combination user property
CN108347421A (en) * 2017-03-31 2018-07-31 北京安天网络安全技术有限公司 A kind of malious email detection method and system based on content
CN108347421B (en) * 2017-03-31 2020-06-19 北京安天网络安全技术有限公司 Malicious mail detection method and system based on content
CN108073718A (en) * 2017-12-29 2018-05-25 长春理工大学 A kind of mail two classification algorithm based on Active Learning and Negative Selection
CN110213152A (en) * 2018-05-02 2019-09-06 腾讯科技(深圳)有限公司 Identify method, apparatus, server and the storage medium of spam
CN110213152B (en) * 2018-05-02 2021-09-14 腾讯科技(深圳)有限公司 Method, device, server and storage medium for identifying junk mails
CN109831373A (en) * 2019-03-01 2019-05-31 论客科技(广州)有限公司 The anti-erroneous judgement method and device of mailing system high-precision intelligent based on FastText algorithm

Similar Documents

Publication Publication Date Title
CN101764765A (en) Spam mail filtering method based on user interest
CN104463552B (en) Calendar reminding generation method and device
CN101257671B (en) Method for real time filtering large scale rubbish SMS based on content
US20090292781A1 (en) Method for filtering e-mail and mail filtering system thereof
CN101345720B (en) Junk mail classification method based on partial match estimation
CN103024746A (en) System and method for processing spam short messages for telecommunication operator
CN105224604B (en) A kind of microblogging incident detection method and its detection device based on heap optimization
CN108173704A (en) A kind of method and device of the net flow assorted based on representative learning
CN105812554A (en) Method and system for intelligently managing text messages in mobile phones
CN110428007A (en) X-ray image object detection method, device and equipment
CN105871887A (en) Client-side based personalized E-mail filtering system and method
CN106228106B (en) A kind of improved real-time vehicle detection filter method and system
CN100556039C (en) Eliminate the method and system of spam erroneous judgement
CN101719924B (en) Unhealthy multimedia message filtering method based on groupware comprehension
CN112887326A (en) Intrusion detection method based on edge cloud cooperation
CN106095747A (en) The recognition methods of a kind of refuse messages and system
CN105721539B (en) A kind of SMS classified device and method of Behavior-based control feature
CN113721299B (en) Subway security inspection mode configuration management method
CN101877066A (en) Anti-image spam method and device
Jawale et al. Hybrid spam detection using machine learning
CN106897423A (en) A kind of cloud platform junk data processing method and system
CN103595614A (en) User feedback based junk mail detection method
Luo et al. Design and implement a rule-based spam filtering system using neural network
CN106230690B (en) A kind of process for sorting mailings and system of combination user property
EP2388700A3 (en) Systems and methods for policy-based program configuration

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20100630