CN106100973A - A kind of personalized rubbish mail filtering method based on node similarity and defecator - Google Patents

A kind of personalized rubbish mail filtering method based on node similarity and defecator Download PDF

Info

Publication number
CN106100973A
CN106100973A CN201610408178.9A CN201610408178A CN106100973A CN 106100973 A CN106100973 A CN 106100973A CN 201610408178 A CN201610408178 A CN 201610408178A CN 106100973 A CN106100973 A CN 106100973A
Authority
CN
China
Prior art keywords
user
spam
interest
similarity
mail
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610408178.9A
Other languages
Chinese (zh)
Inventor
刘昕
邹苹钧
王奕文
王丰
辛兆君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum East China
Original Assignee
China University of Petroleum East China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum East China filed Critical China University of Petroleum East China
Priority to CN201610408178.9A priority Critical patent/CN106100973A/en
Publication of CN106100973A publication Critical patent/CN106100973A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/212Monitoring or handling of messages using filtering or selective blocking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/42Mailbox-related aspects, e.g. synchronisation of mailboxes

Abstract

Embodiments provide a kind of rubbish mail filtering method.Based on node similarity the personalized rubbish mail filtering method of the present invention, it obtains, by the community network at Email User place, the spam information that its trusted good friend is grasped, based on the interest similarity between user, concentrate individual wisdom to form group intelligence and realize Spam filtering.The embodiment of the present invention also provides for a kind of personalized Spam Filtering System device based on node similarity.Spam can be filtered by technical scheme that the embodiment of the present invention provides in time, and improves the accuracy rate of filtration.

Description

A kind of personalized rubbish mail filtering method based on node similarity and defecator
Technical field
The present invention relates to a kind of rubbish mail filtering method, particularly to a kind of personalized rubbish based on node similarity Mail filtering method and defecator.
Background technology
Email is communication mode indispensable in daily life, but thing followed spam is as pestilence one As spread, pollute and destroy the environment of network, to have become network reliable for the subject study for anti-rubbish mail in such circumstances The important subject of communication.
The filtering technique of conventional garbage mail is divided into filter method based on blacklist/white list, based on matched rule And filtration of based on Mail Contents.Black/white list technology is a kind of Spam filtering that range of application is the widest.Black name List can be the IP address of mail server or Email Sender, domain name or E-mail address list, any mail Mail source is if there is being all identified as spam in blacklist;Similar blacklist method, any mail source occurs in In white list list, mail is all considered legitimate mail.Although this technology is the most simple and is easily configured, but in reality Application in poor effect, rate of failing to report and rate of false alarm are the highest, and also compare the real-time update of black/white list with safeguarding Difficulty.
In filter method based on matched rule, most widely used is filter method based on bayesian algorithm.Use The filter of bayesian algorithm can reach the filtration accuracy rate of more than 90%, but this filtering technique needs to use centralized mistake Filter, it is impossible to make full use of the information in network, filter efficiency is impacted, and there is the hidden danger of single point failure, but Due to the high-accuracy of filter based on bayesian algorithm, the method is still worth using for reference.
Method based on Mail Contents is conceived to mail text, carries out text classifying, the method for information filtering to be to filter Mail, it is possible to automatically obtaining spam feature, the accuracy rate of this method is higher, but the most inevitably there are some offices Limit: the character problem of (1) text.The method to the content of text highly dependent upon, for some disguised strong spam, single Text is filtered and just filters out from normal email by single dependence is relatively difficult.(2) problem of filter efficiency.The method The content being required for mail carries out larger numbers of coupling and calculating, and taking internal memory and CPU is that comparison is high, and And the same envelope mail received for many users, to repeatedly carry out the operation calculating and filtering, this leveraged Filter efficiency.
Being found by analysis, the maximum feature of spam is mass-sended exactly, refers to sender same envelope spam It is sent to substantial amounts of recipient.And tradition Spam filtering substantially lacks the analysis to the similarity between user.Cause This, if all Email Users can participate in the filtration of spam in whole network, and by respective institute The understanding information to spam grasped is shared mutually, it is possible to largely make up content-based filtering method Not enough.
Summary of the invention
In order to solve problem of the prior art, the invention provides a kind of personalized spam based on node similarity Filter method, the method application society's trust value and Interest Similarity calculate the credible journey that spam is reported by user good friend Degree, concentrates individual wisdom to form group intelligence, obtains the spam filtering letter that its trusted good friend is grasped fully, in time Breath, thus realize Spam filtering.
The technical solution adopted in the present invention is as follows:
A kind of personalized rubbish mail filtering method based on node similarity, comprises the following steps:
A, according to the social relations of user and similarity, the social trusting degree between definition user and Interest Similarity;
B, using social network user as node, user associates the relation between people as limit, sets up social networks Topological relation figure.Set up the trust value list of user good friend;Set up user this locality interest list, calculate according to interest key word and use Interest Similarity between family;Information according to the spam obtained sets up the local spam list of user;
When C, user receive mail, carry out ground floor filtration by twit filter based on bayesian algorithm.If Judge that this mail is spam, this mail is labeled as spam and is stored in local rubbish with the form of spam report Mail tabulation;
D, user are according to the spam report received, and application node similarity carries out second layer filtration.
E, to reach trust value threshold value and Interest Similarity threshold value good friend's node send spam report.
In step A, the society between described user trusts and refers to: according to system environments residing for user, user A according to The direct trust to user B contacting historical record and draw of user B;Described Interest Similarity refers to: if two users Between there is identical interest, then it is assumed that there is Interest Similarity between user;
In step B, described good friend is that user directly contacts other nodes frequently, described Interest Similarity calculating side Formula is as follows:
I.e. Jaccard Coefficient method: JC=M11/ (M10+M01+M11), wherein M11 represents that two users are emerging Interest key word total in interest list, M10 and M01 all represents one of them user distinctive interest key word.
In step B, the spam information of described acquisition be the marked spam information of user and other contact Crinis Carbonisatus delivers to the spam report information of this node, stores with the MD5 cryptographic Hash of Mail Contents.
In step C, after user accepts mail, can by this mail first with the spam letter of local spam list Breath mates, if not finding the entry of coupling, using bayesian algorithm to mate, updating local rubbish after having mated Mail tabulation.
In step D, when user receives the spam report of other users, if in the local spam list of user There is this report, this report will be left in the basket.There is this mail else if in user's inbox, this mail will be moved into rubbish Case, and it is stored in local spam list with the form of spam report.
In step E, the trust value between user reaches trust value threshold value and Interest Similarity reaches Interest Similarity threshold During value, the spam public lecture of a user is automatically pushed to other users, and spam report includes spam content MD5 cryptographic Hash, the Interest Similarity between user and the flag bit of labelling spam.
On the other hand, the invention provides a kind of personalized junk mail filter device based on node similarity, including With lower module:
Definition module: according to social relations and the similarity of user, the social trusting degree between definition user and interest Similarity;
Set up module: using social network user and good friend thereof as node.Set up good friend's trust value list of user;Set up The local interest list of user, calculates the Interest Similarity between user according to interest key word;Information according to spam Set up the local spam list of user;
Based on bayesian filtering module: when user receives mail, by twit filter based on bayesian algorithm Carry out ground floor filtration.If judging, this mail is spam, this mail is labeled as spam and reports with spam Form be stored in local spam list;
Personalized filtering module based on node similarity: user is according to the spam report received, based on node phase Second layer filtration is carried out like property.
Sending module: send spam report to the good friend's node reaching trust value threshold value and Interest Similarity threshold value.
Technical scheme and acquisition device that the present invention provides have the benefit that
The present invention, by building the community network model of mail user composition, considers the social trust value between user And Interest Similarity, application Interest Similarity algorithm calculates the user good friend credibility to the report of spam, in time Obtain the accuracy rate of the spam filtering information raising Spam filtering that its trusted good friend is grasped.
Accompanying drawing explanation
For the technical scheme being illustrated more clearly that in the embodiment of the present invention, in embodiment being described below required for make Accompanying drawing be briefly described, it should be apparent that, below describe in accompanying drawing be only one embodiment of the present of invention, for From the point of view of those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to obtain other according to these accompanying drawings Accompanying drawing.
Fig. 1 be the present invention a kind of based on node similarity personalized rubbish mail filtering method in simple mail network Topology diagram;
A kind of based on node similarity the personalized junk mail filter device mistake that Fig. 2 provides for one embodiment of the invention Filter spam schematic diagram
A kind of based on node similarity the personalized junk mail filter device that Fig. 3 provides for one embodiment of the invention Structural representation.
Detailed description of the invention
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to embodiment party of the present invention Formula is described in further detail.
Embodiment one
The basis of the present embodiment is, rationally arranges native system parameter in advance, to improve Spam filtering essence Degree.
Trust value between interest and the user of different user is different, therefore in personal information is arranged, sets in advance Determining the interest key word that the system of user uses, user selects key word interested according to the interest of oneself or loses interest in Key word.User is according to the trusting degree of its good friend sets the initial trust value to its good friend, and stores it in user Local trust value table in.Interest Similarity between user and its good friend is calculated by Jaccard Coefficient formula Arrive, and be stored in the local interest list of user.The system numerical value according to Interest Similarity and the Interest Similarity threshold value of setting Comparison update user's trust value to its good friend, and on each node set a local spam list, in order to protect Deposit the MD5 value of calculated spam, with storage and shared spam information.
After user receives an envelope mail, first native system can calculate its MD5 cryptographic Hash according to Mail Contents, with this locality The spam information coupling of storage in spam list, if there is occurrence, this mail will be marked as spam also Lose into refuse bin;Otherwise, this envelope mail can be given twit filter (ground floor mistake based on bayesian algorithm by native system Filter) judge whether this is an envelope spam.If it is determined that this is an envelope spam, this mail is lost into rubbish by native system In rubbish case, and the information of this mail is stored in local spam list.Simultaneously by Spam filtering based on interest The report of this spam is pushed to device (second layer filter) trust value with this user and Interest Similarity exceedes threshold value Good friend.If this user receives the spam report of its good friend, system will be according to the report of paid-in spam and this mail Matching result judge that this mail is whether as spam.
After good friend's node of this user receives spam report, local system first determines whether that whether this report is at this Ground spam list exists, if it is judged that think that this report exists, has then ignored and do not process.Otherwise, if Mail corresponding to this report is in user's inbox, and this mail can be moved in refuse bin by system, and this report is stored in this locality Then spam list carries out next step propelling movement.

Claims (7)

1. a personalized rubbish mail filtering method based on node similarity, comprises the following steps:
A, according to the social relations of user and similarity, the social trusting degree between definition user and Interest Similarity;
B, using social network user as node, user associates the relation between people as limit, sets up social networks topology Graph of a relation.Set up the trust value list of user good friend;Set up user this locality interest list, according to interest key word calculate user it Between Interest Similarity;Information according to the spam obtained sets up the local spam list of user;
When C, user receive mail, carry out ground floor filtration by twit filter based on bayesian algorithm.If judging This mail is spam, this mail is labeled as spam and is stored in local spam with the form of spam report List;
D, user are according to the spam report received, and application node similarity carries out second layer filtration.
E, to reach trust value threshold value and Interest Similarity threshold value good friend's node send spam report.
A kind of rubbish mail filtering method the most according to claim 1, it is characterised in that in described step A, described Society's trust between user refers to: according to system environments residing for user, user A remembers according to the history that directly contacts with user B The trust to user B recorded and draw;Described Interest Similarity refers to: if there is identical interest between two users, then Thinking and there is Interest Similarity between user, identical interest key word is the most, then similarity degree is the highest.
A kind of rubbish mail filtering method the most according to claim 1, it is characterised in that in described step B, described Trust value computing mode is: T=Tij+ b, i.e. on the basis of setting initial trust value, if the Interest Similarity between good friend is high In threshold value, trust value can increase according to the increment b set.Described Interest Similarity calculation is as follows: i.e. Jaccard Coefficient method: JC=M11/ (M10+M01+M11), interest total during wherein M11 represents two user interest lists Key word, M10 and M01 all represents one of them user distinctive interest key word.
In described step B, the spam information of described acquisition be the marked spam information of user and other join It is the Crinis Carbonisatus spam report information of delivering to this node, stores with the MD5 cryptographic Hash of Mail Contents.
A kind of rubbish mail filtering method the most according to claim 1, it is characterised in that in described step C, work as user Receiving after mail, this mail being mated, if not finding by first spam information with local spam list The entry of coupling, uses bayesian algorithm to mate, and updates local spam list after having mated.
A kind of rubbish mail filtering method the most according to claim 1, it is characterised in that in described step D, work as user When receiving the spam report of other users, if there is this report in the local spam list of user, this report will It is left in the basket.There is this mail else if in user's inbox, this mail will be moved into refuse bin, and with spam report Form is stored in local spam list.
A kind of rubbish mail filtering method the most according to claim 1, it is characterised in that in described step E, work as user Between trust value reach trust value threshold value and time Interest Similarity reaches Interest Similarity threshold value, the spam of a user Public lecture is automatically pushed to other users, and spam report includes the MD5 cryptographic Hash of spam content, emerging between user Interest similarity and the flag bit of labelling spam.
7. a personalized Spam Filtering System device based on node similarity, including with lower module:
Definition module: according to social relations and the similarity of user, the social trusting degree between definition user is similar with interest Degree;
Set up module: using social network user and good friend thereof as node.Set up good friend's trust value list of user;Set up user Local interest list, calculate the Interest Similarity between user according to interest key word;Information according to spam is set up The local spam list of user;
Based on bayesian filtering module: when user receives mail, carried out by twit filter based on bayesian algorithm Ground floor filters.If judging, this mail is spam, and this mail is labeled as spam the shape with spam report Formula is stored in local spam list;
Personalized filtering module based on node similarity: when user receives the spam report of other users, application Node similarity carries out second layer filtration.
Sending module: send spam report to the good friend's node reaching trust value threshold value and Interest Similarity threshold value.
CN201610408178.9A 2016-06-07 2016-06-07 A kind of personalized rubbish mail filtering method based on node similarity and defecator Pending CN106100973A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610408178.9A CN106100973A (en) 2016-06-07 2016-06-07 A kind of personalized rubbish mail filtering method based on node similarity and defecator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610408178.9A CN106100973A (en) 2016-06-07 2016-06-07 A kind of personalized rubbish mail filtering method based on node similarity and defecator

Publications (1)

Publication Number Publication Date
CN106100973A true CN106100973A (en) 2016-11-09

Family

ID=57228760

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610408178.9A Pending CN106100973A (en) 2016-06-07 2016-06-07 A kind of personalized rubbish mail filtering method based on node similarity and defecator

Country Status (1)

Country Link
CN (1) CN106100973A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107171944A (en) * 2017-06-27 2017-09-15 北京二六三企业通信有限公司 The recognition methods of spam and device
CN109639838A (en) * 2019-02-13 2019-04-16 广州秦耀照明电器有限公司 A kind of information classification storage system based on big data
CN110753024A (en) * 2018-07-23 2020-02-04 南京航空航天大学 Personalized mail re-filtering method in collective environment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778213A (en) * 2015-03-19 2015-07-15 同济大学 Social network recommendation method based on random walk

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778213A (en) * 2015-03-19 2015-07-15 同济大学 Social network recommendation method based on random walk

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
XIN LIU ET AL.: ""A decentralized and personalized spam filter based on social computing"", 《2014 INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING CONFERENCE (IWCMC)》 *
ZE LI ET AL.: ""Soap: A social network aided personalized and effective spam filter to clean your e-mail box"", 《2011 PROCEEDINGS IEEE INFOCOM》 *
刘昕 等: ""基于社会信任的恶意网页协防机制"", 《通信学报》 *
贺银慧: ""社会网络中用户信任关系的研究及其应用"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107171944A (en) * 2017-06-27 2017-09-15 北京二六三企业通信有限公司 The recognition methods of spam and device
CN110753024A (en) * 2018-07-23 2020-02-04 南京航空航天大学 Personalized mail re-filtering method in collective environment
CN109639838A (en) * 2019-02-13 2019-04-16 广州秦耀照明电器有限公司 A kind of information classification storage system based on big data

Similar Documents

Publication Publication Date Title
US10911383B2 (en) Spam filtering and person profiles
Golbeck et al. Reputation Network Analysis for Email Filtering.
US9311415B2 (en) Generating contact suggestions
US8688793B2 (en) System and method for insertion of addresses in electronic messages
Wang Detecting spam bots in online social networking sites: a machine learning approach
US10778624B2 (en) Systems and methods for spam filtering
US7469292B2 (en) Managing electronic messages using contact information
US8600965B2 (en) System and method for observing communication behavior
US20040044536A1 (en) Providing common contact discovery and management to electronic mail users
US8296372B2 (en) Method and system for merging electronic messages
CN106100973A (en) A kind of personalized rubbish mail filtering method based on node similarity and defecator
CN103297317B (en) A kind of send mail method, a kind of electronic equipment
Murukannaiah et al. Platys Social: Relating shared places and private social circles
CN103346956A (en) Method and device for expanding social relation in social network
Diehl et al. Name reference resolution in organizational email archives
CN102299868A (en) Method, client and system for transmitting and receiving email
JP2005244647A (en) Community forming device
CN103841121B (en) A kind of comment and interaction systems and method based on local file
CN109685129A (en) A kind of multiclass social application subject information cluster association method based on smart phone
Blosser et al. Privacy preserving collaborative social network
US20160072752A1 (en) Filtering electronic messages based on domain attributes without reputation
Moniza et al. An assortment of spam detection system
Bhalerao et al. Improved social network aided personalized spam filtering approach using RBF neural network
Li et al. Sentiment Diffusion of Social Inequality in Microblogs: A Case Study of “Migrant Worker” in Sina Weibo
Limanto et al. Social Media Spamming on the Class Room Electronic

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20161109

RJ01 Rejection of invention patent application after publication