CN106100973A - A kind of personalized rubbish mail filtering method based on node similarity and defecator - Google Patents
A kind of personalized rubbish mail filtering method based on node similarity and defecator Download PDFInfo
- Publication number
- CN106100973A CN106100973A CN201610408178.9A CN201610408178A CN106100973A CN 106100973 A CN106100973 A CN 106100973A CN 201610408178 A CN201610408178 A CN 201610408178A CN 106100973 A CN106100973 A CN 106100973A
- Authority
- CN
- China
- Prior art keywords
- user
- spam
- interest
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/21—Monitoring or handling of messages
- H04L51/212—Monitoring or handling of messages using filtering or selective blocking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/42—Mailbox-related aspects, e.g. synchronisation of mailboxes
Abstract
Embodiments provide a kind of rubbish mail filtering method.Based on node similarity the personalized rubbish mail filtering method of the present invention, it obtains, by the community network at Email User place, the spam information that its trusted good friend is grasped, based on the interest similarity between user, concentrate individual wisdom to form group intelligence and realize Spam filtering.The embodiment of the present invention also provides for a kind of personalized Spam Filtering System device based on node similarity.Spam can be filtered by technical scheme that the embodiment of the present invention provides in time, and improves the accuracy rate of filtration.
Description
Technical field
The present invention relates to a kind of rubbish mail filtering method, particularly to a kind of personalized rubbish based on node similarity
Mail filtering method and defecator.
Background technology
Email is communication mode indispensable in daily life, but thing followed spam is as pestilence one
As spread, pollute and destroy the environment of network, to have become network reliable for the subject study for anti-rubbish mail in such circumstances
The important subject of communication.
The filtering technique of conventional garbage mail is divided into filter method based on blacklist/white list, based on matched rule
And filtration of based on Mail Contents.Black/white list technology is a kind of Spam filtering that range of application is the widest.Black name
List can be the IP address of mail server or Email Sender, domain name or E-mail address list, any mail
Mail source is if there is being all identified as spam in blacklist;Similar blacklist method, any mail source occurs in
In white list list, mail is all considered legitimate mail.Although this technology is the most simple and is easily configured, but in reality
Application in poor effect, rate of failing to report and rate of false alarm are the highest, and also compare the real-time update of black/white list with safeguarding
Difficulty.
In filter method based on matched rule, most widely used is filter method based on bayesian algorithm.Use
The filter of bayesian algorithm can reach the filtration accuracy rate of more than 90%, but this filtering technique needs to use centralized mistake
Filter, it is impossible to make full use of the information in network, filter efficiency is impacted, and there is the hidden danger of single point failure, but
Due to the high-accuracy of filter based on bayesian algorithm, the method is still worth using for reference.
Method based on Mail Contents is conceived to mail text, carries out text classifying, the method for information filtering to be to filter
Mail, it is possible to automatically obtaining spam feature, the accuracy rate of this method is higher, but the most inevitably there are some offices
Limit: the character problem of (1) text.The method to the content of text highly dependent upon, for some disguised strong spam, single
Text is filtered and just filters out from normal email by single dependence is relatively difficult.(2) problem of filter efficiency.The method
The content being required for mail carries out larger numbers of coupling and calculating, and taking internal memory and CPU is that comparison is high, and
And the same envelope mail received for many users, to repeatedly carry out the operation calculating and filtering, this leveraged
Filter efficiency.
Being found by analysis, the maximum feature of spam is mass-sended exactly, refers to sender same envelope spam
It is sent to substantial amounts of recipient.And tradition Spam filtering substantially lacks the analysis to the similarity between user.Cause
This, if all Email Users can participate in the filtration of spam in whole network, and by respective institute
The understanding information to spam grasped is shared mutually, it is possible to largely make up content-based filtering method
Not enough.
Summary of the invention
In order to solve problem of the prior art, the invention provides a kind of personalized spam based on node similarity
Filter method, the method application society's trust value and Interest Similarity calculate the credible journey that spam is reported by user good friend
Degree, concentrates individual wisdom to form group intelligence, obtains the spam filtering letter that its trusted good friend is grasped fully, in time
Breath, thus realize Spam filtering.
The technical solution adopted in the present invention is as follows:
A kind of personalized rubbish mail filtering method based on node similarity, comprises the following steps:
A, according to the social relations of user and similarity, the social trusting degree between definition user and Interest Similarity;
B, using social network user as node, user associates the relation between people as limit, sets up social networks
Topological relation figure.Set up the trust value list of user good friend;Set up user this locality interest list, calculate according to interest key word and use
Interest Similarity between family;Information according to the spam obtained sets up the local spam list of user;
When C, user receive mail, carry out ground floor filtration by twit filter based on bayesian algorithm.If
Judge that this mail is spam, this mail is labeled as spam and is stored in local rubbish with the form of spam report
Mail tabulation;
D, user are according to the spam report received, and application node similarity carries out second layer filtration.
E, to reach trust value threshold value and Interest Similarity threshold value good friend's node send spam report.
In step A, the society between described user trusts and refers to: according to system environments residing for user, user A according to
The direct trust to user B contacting historical record and draw of user B;Described Interest Similarity refers to: if two users
Between there is identical interest, then it is assumed that there is Interest Similarity between user;
In step B, described good friend is that user directly contacts other nodes frequently, described Interest Similarity calculating side
Formula is as follows:
I.e. Jaccard Coefficient method: JC=M11/ (M10+M01+M11), wherein M11 represents that two users are emerging
Interest key word total in interest list, M10 and M01 all represents one of them user distinctive interest key word.
In step B, the spam information of described acquisition be the marked spam information of user and other contact
Crinis Carbonisatus delivers to the spam report information of this node, stores with the MD5 cryptographic Hash of Mail Contents.
In step C, after user accepts mail, can by this mail first with the spam letter of local spam list
Breath mates, if not finding the entry of coupling, using bayesian algorithm to mate, updating local rubbish after having mated
Mail tabulation.
In step D, when user receives the spam report of other users, if in the local spam list of user
There is this report, this report will be left in the basket.There is this mail else if in user's inbox, this mail will be moved into rubbish
Case, and it is stored in local spam list with the form of spam report.
In step E, the trust value between user reaches trust value threshold value and Interest Similarity reaches Interest Similarity threshold
During value, the spam public lecture of a user is automatically pushed to other users, and spam report includes spam content
MD5 cryptographic Hash, the Interest Similarity between user and the flag bit of labelling spam.
On the other hand, the invention provides a kind of personalized junk mail filter device based on node similarity, including
With lower module:
Definition module: according to social relations and the similarity of user, the social trusting degree between definition user and interest
Similarity;
Set up module: using social network user and good friend thereof as node.Set up good friend's trust value list of user;Set up
The local interest list of user, calculates the Interest Similarity between user according to interest key word;Information according to spam
Set up the local spam list of user;
Based on bayesian filtering module: when user receives mail, by twit filter based on bayesian algorithm
Carry out ground floor filtration.If judging, this mail is spam, this mail is labeled as spam and reports with spam
Form be stored in local spam list;
Personalized filtering module based on node similarity: user is according to the spam report received, based on node phase
Second layer filtration is carried out like property.
Sending module: send spam report to the good friend's node reaching trust value threshold value and Interest Similarity threshold value.
Technical scheme and acquisition device that the present invention provides have the benefit that
The present invention, by building the community network model of mail user composition, considers the social trust value between user
And Interest Similarity, application Interest Similarity algorithm calculates the user good friend credibility to the report of spam, in time
Obtain the accuracy rate of the spam filtering information raising Spam filtering that its trusted good friend is grasped.
Accompanying drawing explanation
For the technical scheme being illustrated more clearly that in the embodiment of the present invention, in embodiment being described below required for make
Accompanying drawing be briefly described, it should be apparent that, below describe in accompanying drawing be only one embodiment of the present of invention, for
From the point of view of those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to obtain other according to these accompanying drawings
Accompanying drawing.
Fig. 1 be the present invention a kind of based on node similarity personalized rubbish mail filtering method in simple mail network
Topology diagram;
A kind of based on node similarity the personalized junk mail filter device mistake that Fig. 2 provides for one embodiment of the invention
Filter spam schematic diagram
A kind of based on node similarity the personalized junk mail filter device that Fig. 3 provides for one embodiment of the invention
Structural representation.
Detailed description of the invention
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to embodiment party of the present invention
Formula is described in further detail.
Embodiment one
The basis of the present embodiment is, rationally arranges native system parameter in advance, to improve Spam filtering essence
Degree.
Trust value between interest and the user of different user is different, therefore in personal information is arranged, sets in advance
Determining the interest key word that the system of user uses, user selects key word interested according to the interest of oneself or loses interest in
Key word.User is according to the trusting degree of its good friend sets the initial trust value to its good friend, and stores it in user
Local trust value table in.Interest Similarity between user and its good friend is calculated by Jaccard Coefficient formula
Arrive, and be stored in the local interest list of user.The system numerical value according to Interest Similarity and the Interest Similarity threshold value of setting
Comparison update user's trust value to its good friend, and on each node set a local spam list, in order to protect
Deposit the MD5 value of calculated spam, with storage and shared spam information.
After user receives an envelope mail, first native system can calculate its MD5 cryptographic Hash according to Mail Contents, with this locality
The spam information coupling of storage in spam list, if there is occurrence, this mail will be marked as spam also
Lose into refuse bin;Otherwise, this envelope mail can be given twit filter (ground floor mistake based on bayesian algorithm by native system
Filter) judge whether this is an envelope spam.If it is determined that this is an envelope spam, this mail is lost into rubbish by native system
In rubbish case, and the information of this mail is stored in local spam list.Simultaneously by Spam filtering based on interest
The report of this spam is pushed to device (second layer filter) trust value with this user and Interest Similarity exceedes threshold value
Good friend.If this user receives the spam report of its good friend, system will be according to the report of paid-in spam and this mail
Matching result judge that this mail is whether as spam.
After good friend's node of this user receives spam report, local system first determines whether that whether this report is at this
Ground spam list exists, if it is judged that think that this report exists, has then ignored and do not process.Otherwise, if
Mail corresponding to this report is in user's inbox, and this mail can be moved in refuse bin by system, and this report is stored in this locality
Then spam list carries out next step propelling movement.
Claims (7)
1. a personalized rubbish mail filtering method based on node similarity, comprises the following steps:
A, according to the social relations of user and similarity, the social trusting degree between definition user and Interest Similarity;
B, using social network user as node, user associates the relation between people as limit, sets up social networks topology
Graph of a relation.Set up the trust value list of user good friend;Set up user this locality interest list, according to interest key word calculate user it
Between Interest Similarity;Information according to the spam obtained sets up the local spam list of user;
When C, user receive mail, carry out ground floor filtration by twit filter based on bayesian algorithm.If judging
This mail is spam, this mail is labeled as spam and is stored in local spam with the form of spam report
List;
D, user are according to the spam report received, and application node similarity carries out second layer filtration.
E, to reach trust value threshold value and Interest Similarity threshold value good friend's node send spam report.
A kind of rubbish mail filtering method the most according to claim 1, it is characterised in that in described step A, described
Society's trust between user refers to: according to system environments residing for user, user A remembers according to the history that directly contacts with user B
The trust to user B recorded and draw;Described Interest Similarity refers to: if there is identical interest between two users, then
Thinking and there is Interest Similarity between user, identical interest key word is the most, then similarity degree is the highest.
A kind of rubbish mail filtering method the most according to claim 1, it is characterised in that in described step B, described
Trust value computing mode is: T=Tij+ b, i.e. on the basis of setting initial trust value, if the Interest Similarity between good friend is high
In threshold value, trust value can increase according to the increment b set.Described Interest Similarity calculation is as follows: i.e. Jaccard
Coefficient method: JC=M11/ (M10+M01+M11), interest total during wherein M11 represents two user interest lists
Key word, M10 and M01 all represents one of them user distinctive interest key word.
In described step B, the spam information of described acquisition be the marked spam information of user and other join
It is the Crinis Carbonisatus spam report information of delivering to this node, stores with the MD5 cryptographic Hash of Mail Contents.
A kind of rubbish mail filtering method the most according to claim 1, it is characterised in that in described step C, work as user
Receiving after mail, this mail being mated, if not finding by first spam information with local spam list
The entry of coupling, uses bayesian algorithm to mate, and updates local spam list after having mated.
A kind of rubbish mail filtering method the most according to claim 1, it is characterised in that in described step D, work as user
When receiving the spam report of other users, if there is this report in the local spam list of user, this report will
It is left in the basket.There is this mail else if in user's inbox, this mail will be moved into refuse bin, and with spam report
Form is stored in local spam list.
A kind of rubbish mail filtering method the most according to claim 1, it is characterised in that in described step E, work as user
Between trust value reach trust value threshold value and time Interest Similarity reaches Interest Similarity threshold value, the spam of a user
Public lecture is automatically pushed to other users, and spam report includes the MD5 cryptographic Hash of spam content, emerging between user
Interest similarity and the flag bit of labelling spam.
7. a personalized Spam Filtering System device based on node similarity, including with lower module:
Definition module: according to social relations and the similarity of user, the social trusting degree between definition user is similar with interest
Degree;
Set up module: using social network user and good friend thereof as node.Set up good friend's trust value list of user;Set up user
Local interest list, calculate the Interest Similarity between user according to interest key word;Information according to spam is set up
The local spam list of user;
Based on bayesian filtering module: when user receives mail, carried out by twit filter based on bayesian algorithm
Ground floor filters.If judging, this mail is spam, and this mail is labeled as spam the shape with spam report
Formula is stored in local spam list;
Personalized filtering module based on node similarity: when user receives the spam report of other users, application
Node similarity carries out second layer filtration.
Sending module: send spam report to the good friend's node reaching trust value threshold value and Interest Similarity threshold value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610408178.9A CN106100973A (en) | 2016-06-07 | 2016-06-07 | A kind of personalized rubbish mail filtering method based on node similarity and defecator |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610408178.9A CN106100973A (en) | 2016-06-07 | 2016-06-07 | A kind of personalized rubbish mail filtering method based on node similarity and defecator |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106100973A true CN106100973A (en) | 2016-11-09 |
Family
ID=57228760
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610408178.9A Pending CN106100973A (en) | 2016-06-07 | 2016-06-07 | A kind of personalized rubbish mail filtering method based on node similarity and defecator |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106100973A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107171944A (en) * | 2017-06-27 | 2017-09-15 | 北京二六三企业通信有限公司 | The recognition methods of spam and device |
CN109639838A (en) * | 2019-02-13 | 2019-04-16 | 广州秦耀照明电器有限公司 | A kind of information classification storage system based on big data |
CN110753024A (en) * | 2018-07-23 | 2020-02-04 | 南京航空航天大学 | Personalized mail re-filtering method in collective environment |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104778213A (en) * | 2015-03-19 | 2015-07-15 | 同济大学 | Social network recommendation method based on random walk |
-
2016
- 2016-06-07 CN CN201610408178.9A patent/CN106100973A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104778213A (en) * | 2015-03-19 | 2015-07-15 | 同济大学 | Social network recommendation method based on random walk |
Non-Patent Citations (4)
Title |
---|
XIN LIU ET AL.: ""A decentralized and personalized spam filter based on social computing"", 《2014 INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING CONFERENCE (IWCMC)》 * |
ZE LI ET AL.: ""Soap: A social network aided personalized and effective spam filter to clean your e-mail box"", 《2011 PROCEEDINGS IEEE INFOCOM》 * |
刘昕 等: ""基于社会信任的恶意网页协防机制"", 《通信学报》 * |
贺银慧: ""社会网络中用户信任关系的研究及其应用"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107171944A (en) * | 2017-06-27 | 2017-09-15 | 北京二六三企业通信有限公司 | The recognition methods of spam and device |
CN110753024A (en) * | 2018-07-23 | 2020-02-04 | 南京航空航天大学 | Personalized mail re-filtering method in collective environment |
CN109639838A (en) * | 2019-02-13 | 2019-04-16 | 广州秦耀照明电器有限公司 | A kind of information classification storage system based on big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10911383B2 (en) | Spam filtering and person profiles | |
Golbeck et al. | Reputation Network Analysis for Email Filtering. | |
US9311415B2 (en) | Generating contact suggestions | |
US8688793B2 (en) | System and method for insertion of addresses in electronic messages | |
Wang | Detecting spam bots in online social networking sites: a machine learning approach | |
US10778624B2 (en) | Systems and methods for spam filtering | |
US7469292B2 (en) | Managing electronic messages using contact information | |
US8600965B2 (en) | System and method for observing communication behavior | |
US20040044536A1 (en) | Providing common contact discovery and management to electronic mail users | |
US8296372B2 (en) | Method and system for merging electronic messages | |
CN106100973A (en) | A kind of personalized rubbish mail filtering method based on node similarity and defecator | |
CN103297317B (en) | A kind of send mail method, a kind of electronic equipment | |
Murukannaiah et al. | Platys Social: Relating shared places and private social circles | |
CN103346956A (en) | Method and device for expanding social relation in social network | |
Diehl et al. | Name reference resolution in organizational email archives | |
CN102299868A (en) | Method, client and system for transmitting and receiving email | |
JP2005244647A (en) | Community forming device | |
CN103841121B (en) | A kind of comment and interaction systems and method based on local file | |
CN109685129A (en) | A kind of multiclass social application subject information cluster association method based on smart phone | |
Blosser et al. | Privacy preserving collaborative social network | |
US20160072752A1 (en) | Filtering electronic messages based on domain attributes without reputation | |
Moniza et al. | An assortment of spam detection system | |
Bhalerao et al. | Improved social network aided personalized spam filtering approach using RBF neural network | |
Li et al. | Sentiment Diffusion of Social Inequality in Microblogs: A Case Study of “Migrant Worker” in Sina Weibo | |
Limanto et al. | Social Media Spamming on the Class Room Electronic |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161109 |
|
RJ01 | Rejection of invention patent application after publication |