CN106341303A - Sender credibility generation method based on mail user behavior - Google Patents

Sender credibility generation method based on mail user behavior Download PDF

Info

Publication number
CN106341303A
CN106341303A CN201510401224.8A CN201510401224A CN106341303A CN 106341303 A CN106341303 A CN 106341303A CN 201510401224 A CN201510401224 A CN 201510401224A CN 106341303 A CN106341303 A CN 106341303A
Authority
CN
China
Prior art keywords
envelopes
mail
transmitting
credit value
transmit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510401224.8A
Other languages
Chinese (zh)
Other versions
CN106341303B (en
Inventor
何庆
魏丽丽
许敬伟
周乐坤
梁宇文
张坚
刘再元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Medium shift information technology Co., Ltd.
Polytron Technologies Inc
Original Assignee
POLYTRON TECHNOLOGIES Inc
China Mobile Group Guangdong Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by POLYTRON TECHNOLOGIES Inc, China Mobile Group Guangdong Co Ltd filed Critical POLYTRON TECHNOLOGIES Inc
Priority to CN201510401224.8A priority Critical patent/CN106341303B/en
Publication of CN106341303A publication Critical patent/CN106341303A/en
Application granted granted Critical
Publication of CN106341303B publication Critical patent/CN106341303B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a sender credibility generation method based on mail user behavior. The method comprises the steps that of A) system initializing, B) sender credibility value generation and C) sender credibility value storage. According to step A), a configuration file is loaded; the sender's eigenvalue is extracted from log information; and a characteristic database is connected. According to step B), the number of historical sent letters of a user, the sending success rate, the number of sent letters of the day, whether a recipient responds and the mail content are analyzed to generate an appropriate credibility value. Compared with a common sender credibility generation method, the method provided by the invention has the advantages that user behavior analysis is carried out on massive logs; main characteristics of a spam mail, such as the total number of sent letters of the sender, the number of sent letters of the day, the sending success rate, the mail size, the mail content and credibility domain sending, are comprehensively considered to generate the credibility value of the sender; spam mail misjudgment is avoided; and the ability of spam mail filtering is improved.

Description

Sender reputation's generation method based on mail user behavior
Technical field
The invention belongs to spam process field, especially relate to sender reputation's generation method, be a kind of sender reputation's generation method based on mail user historical behavior.
Background technology
With the extensive application of Email, incident spam problem is increasingly serious.It not only consumption of network resources, take the network bandwidth, waste quality time of user and expenses of surfing Internet, and serious menace network safety, it has also become network public hazards, bring serious economic loss.China Internet association anti-rubbish mail center is issued investigation report and is shown, spam constantly increases in scale, and the spam ratio that Chinese netizen averagely receives weekly is 55.65%.Solve the problems, such as that spam spreads unchecked in the urgent need to effective technology.Filtering technique for conventional garbage mail mainly has mail to send certification, black and white lists, content filtering technology, fingerprint technique, bayesian filtering technology etc., but all more or less presence shortcoming of these technology, cannot absolutely accurately judge, wherein number of patent application is 201310115340.4, invention and created name is " system and method for based on user's prestige filtering spam email message ", describe a kind of anti-rubbish mail method, based on groups of users by the regular and synchronized of user feedback high for credit value to the low user of credit value, this way makes the rule of user be affected, the real situation of user cannot be reacted, also the credit value of user cannot be done with real-time adjustment.In order to improve spam interception accuracy rate, a kind of new Interception Technology in the urgent need to.
Content of the invention
It is an object of the invention to provide a kind of have the method generating sender reputation's value based on user behavior analysis, for preferably filtering spam mail.
The object of the present invention is achieved like this: a kind of method generating credit value based on user behavior analysis.It includes step:
A), initialization system, loading configuration file, extract addresser's eigenvalue, connection features data base from log information, comprising:
(a1) prepare before running, load massive logs file and be analyzed, extract mail body size from log information, transmit and successfully count, transmit and unsuccessfully count, transmit sum, addressee replys number, Mail Contents, sender's domain name, and ip transmits the information that successfully and unsuccessfully counts;
(a2) eigenvalue that daily record is extracted is saved in property data base;
B), sender reputation's value generation phase, this stage is mainly by transmitting number to user's history, transmits success rate, the same day transmits number, whether addressee replys, Mail Contents are analyzed generating corresponding credit value;
C), sender reputation's value binning phase, specifically comprises the following steps that
(c1) if the eigenvalue of sender have matched the credit value described in step b), the credit value of generation is saved in data base;
(c2) if the credit value described in the eigenvalue non-matching step b) of sender, eigenvalue is saved in data base, analyzes again for next time.
The present invention is compared to common sender reputation's generation method, the invention has the beneficial effects as follows by user behavior analysis are carried out to massive logs, sender reputation's value that the key property such as sender considering spam transmits sum, the same day transmits quantity, transmit success rate, mail size, Mail Contents, inter-trust domain transmit etc. and generates, the erroneous judgement of spam so can be avoided, improve and solve the ability of Spam filtering well.
Brief description
Fig. 1 is method of the present invention flow chart.
Specific embodiment
As shown in figure 1, the present invention relates to a kind of method generating sender reputation's value based on user behavior analysis, including step:
A), initialization system, loading configuration file, extract addresser's eigenvalue, connection features database from log information;
(1) prepare before running, load massive logs file and be analyzed, from log information, extract mail body size,
Transmit and successfully count, transmit and unsuccessfully count, transmit sum, addressee replys number, Mail Contents, sender's domain name, ip transmits the successfully and unsuccessfully information such as several;
(2) eigenvalue that daily record is extracted is saved in property data base.
B), sender reputation's value generation phase, this stage is mainly by transmitting number to user's history, transmits success rate, the same day transmits number, whether addressee replys, Mail Contents are analyzed generating corresponding credit value, and key step is as follows:
(1) judged after extracting eigenvalue from data base, if addresser's history transmits sum and is less than 3 envelopes, data volume it is impossible to generate credit value, directly terminates flow process very little;
(2) when the history amount of transmitting is more than 3 envelopes, when mail sends success rate less than 0.76, setting credit value is 30 points;
(3) it is 100% when transmit success rate transmitting record success rate for 100%, ip, and when addressee has reply or Mail Contents coupling Trusted Critical word or mail size to have more than 500k or have inter-trust domain to transmit either condition and meet, setting credit value is 40 points;
(4) when the amount of transmitting is more than 5 envelopes, transmitting the frequency of failure is 0, addressee and sum is more than 3, and when mail contains Trusted Critical word, setting credit value is 80 points;
(5) when the amount of transmitting is more than 5 envelopes, transmitting the frequency of failure is 0, if the same day transmits more than 1 envelope, and the Trusted Critical word of mail coupling more than 2 or oriented inter-trust domain transmits or addressee has the mail size write in reply or send when meeting more than 2 envelope either condition more than 500k, setting credit value is 80 points;
(6) when the amount of transmitting is more than 5 envelopes, transmit unsuccessfully number and be more than 0 to 2 envelopes, be that inter-trust domain transmits, and when the same day transmits more than 1 envelope, setting credit value is 70 points;
(7) when the amount of transmitting is more than 5 envelopes, transmit unsuccessfully number and be more than 0 to 2 envelopes, have addressee to write in reply, and when the same day transmits more than 1 envelope, setting credit value is 70 points;
(8) when the amount of transmitting is more than 5 envelopes, transmit unsuccessfully number and be more than 0 to 2 envelopes, Mail Contents contain believable key word, Mail Contents mate believable key word more than 2, and when the same day transmits more than 1 envelope, setting credit value is 70 points;
(9) when the amount of transmitting is more than 5 envelopes, transmit unsuccessfully number and be more than 0 to 2 envelopes, Mail Contents contain believable key word, when the mail size of transmission is at least 1 envelope more than 500k, setting credit value is 70 points;
(10) when the amount of transmitting is more than 5 envelopes, transmit unsuccessfully number and be more than 0 to 2 envelopes, Mail Contents contain believable key word, and Mail Contents contain believable key word, addressee have identical and sum more than 3 when, setting credit value is 70 points;
(11) when the amount of transmitting is more than 5 envelopes, transmit unsuccessfully number be more than 2 to 9 envelopes, transmit unsuccessfully number for 3 and the same day transmit less than 3 envelope when, setting credit value be 30 points;
(12) when the amount of transmitting is more than 5 envelopes, transmit unsuccessfully number and be more than 2 to 9 envelopes, the amount of transmitting is more than 20 envelopes, Mail Contents mate believable key word number more than 4 and addressee's sum more than 12 and of the same name more than 4 when, setting credit value is 70 points;
(13) when the amount of transmitting is more than 5 envelopes, transmit unsuccessfully number and be more than 2 to 9 envelopes, the amount of transmitting is more than 20 envelopes, Mail Contents mate believable key word number more than 4 and when the same day transmits more than 4 envelope, setting credit value is 70 points;
(14) when transmitting number of times less than 5 envelopes, transmit unsuccessfully number and be more than 0 to 2 envelopes, mail size is at least 1 envelope more than 500k, when Mail Contents contain believable key word, setting credit value is 70 points.
C), sender reputation's value binning phase, specifically comprises the following steps that
(1) if the eigenvalue of sender have matched a certain rule above, the credit value of generation is saved in data base.
(2) if the eigenvalue of sender does not match arbitrary rule, eigenvalue is saved in data base, analyzes again for next time.
It is analyzed by the behavior that transmits long-term to user, whether user's history transmits behavior to email in future is that spam has predictability, such as sender sent spam in the past, send out again later an envelope mail be spam probability very high, by intelligent algorithm, the behavior analysiss that transmit of user are drawn, spam possesses following characteristic:
1) mail size is not too large, and too conference affects the delivery speed of spam.
2) success rate sending is not high, and some mails are given by anti-spam system and intercepted.
3) traffic volume is big, is typically transmitted by mass-sending instrument.
4) addressee will not reply.
5) Mail Contents mostly are advertisement, political or pornographic speech.
6) transmit domain name and mostly be strange domain name.
By the method for machine learning, the massive logs producing on line are analyzed, choose mail body size, transmit and successfully count, transmit and unsuccessfully count, transmit sum, addressee replys number, Mail Contents, sender's domain name, i () ip transmits successfully and unsuccessfully multiple characteristic dimension such as number, by massive logs, characteristic model is trained, (ii) these eigenvalues are generated with an overall reputation score storehouse, mail mates this feature prestige storehouse in real time, (iii) the specific credit value of sender is generated to the sender meeting condition, improve the accuracy of credit value.
Sender reputation's value is intelligently generated by above characteristic, is a kind of good method for filtering spam mail, and fact proved highly effective, False Rate is very low.
In sum, the invention has the beneficial effects as follows by user behavior analysis are carried out to massive logs, sender reputation's value that the key property such as sender considering spam transmits sum, the same day transmits quantity, transmit success rate, mail size, Mail Contents, inter-trust domain transmit etc. and generates, so can prevent certain single features from causing the deviation of credit value, cause the erroneous judgement of spam, improve and solve the ability of Spam filtering well.

Claims (2)

1. a kind of sender reputation's generation method based on mail user behavior, including step:
A), initialization system, loading configuration file, extract addresser's eigenvalue, connection features data base from log information, comprising:
(a1) prepare before running, load massive logs file and be analyzed, extract mail body size from log information, transmit and successfully count, transmit and unsuccessfully count, transmit sum, addressee replys number, Mail Contents, sender's domain name, and ip transmits the information that successfully and unsuccessfully counts;
(a2) eigenvalue that daily record is extracted is saved in property data base;
B), sender reputation's value generation phase, this stage is mainly by transmitting number to user's history, transmits success rate, the same day transmits number, whether addressee replys, Mail Contents are analyzed generating corresponding credit value;
C), sender reputation's value binning phase, specifically comprises the following steps that
(c1) if the eigenvalue of sender have matched the credit value described in step b), the credit value of generation is saved in data base;
(c2) if the credit value described in the eigenvalue non-matching step b) of sender, eigenvalue is saved in data base, analyzes again for next time.
2. sender reputation's generation method based on sender's feature as claimed in claim 1 it is characterised in that: step b) comprises the following steps that
(b1) judged after extracting eigenvalue from data base, if addresser's history transmits sum and is less than 3 envelopes, data volume it is impossible to generate credit value, directly terminates flow process very little;
(b2) when the history amount of transmitting is more than 3 envelopes, when mail sends success rate less than 0.76, it is set to the first credit value;
(b3) it is 100% when transmit success rate transmitting record success rate for 100%, ip, and when addressee has reply or Mail Contents coupling Trusted Critical word or mail size to have more than 500k or have inter-trust domain to transmit either condition and meet, be set to the second credit value;
(b4) when the amount of transmitting is more than 5 envelopes, transmitting the frequency of failure is 0, and addressee and sum is more than 3, when mail contains Trusted Critical word, is set to the 3rd credit value;
(b5) when the amount of transmitting is more than 5 envelopes, transmitting the frequency of failure is 0, if the same day transmits more than 1 envelope, and the Trusted Critical word of mail coupling more than 2 or oriented inter-trust domain transmits or addressee has the mail size write in reply or send when meeting more than 2 envelope either condition more than 500k, be set to the 4th credit value;
(b6) when the amount of transmitting is more than 5 envelopes, transmit unsuccessfully number and be more than 0 to 2 envelopes, be that inter-trust domain transmits, and when the same day transmits more than 1 envelope, be set to the 5th credit value;
(b7) when the amount of transmitting is more than 5 envelopes, transmit unsuccessfully number and be more than 0 to 2 envelopes, have addressee to write in reply, and when the same day transmits more than 1 envelope, be set to the 6th credit value;
(b8) when the amount of transmitting is more than 5 envelopes, transmit unsuccessfully number and be more than 0 to 2 envelopes, Mail Contents contain believable key word, Mail Contents mate believable key word more than 2, and when the same day transmits more than 1 envelope, are set to the 7th credit value;
(b9) when the amount of transmitting is more than 5 envelopes, transmit unsuccessfully number and be more than 0 to 2 envelopes, Mail Contents contain believable key word, when the mail size of transmission is at least 1 envelope more than 500k, are set to the 8th credit value;
(b10) when the amount of transmitting is more than 5 envelopes, transmit unsuccessfully number and be more than 0 to 2 envelopes, Mail Contents contain believable key word, and Mail Contents contain believable key word, addressee have identical and sum more than 3 when, the 9th credit value is set;
(b11) when the amount of transmitting is more than 5 envelopes, transmit unsuccessfully number be more than 2 to 9 envelopes, transmit unsuccessfully number for 3 and the same day transmit less than 3 envelope when, be set to the tenth credit value;
(b12) when the amount of transmitting is more than 5 envelopes, transmit unsuccessfully number and be more than 2 to 9 envelopes, the amount of transmitting is more than 20 envelopes, Mail Contents mate believable key word number more than 4 and addressee's sum more than 12 and of the same name more than 4 when, be set to the 11st credit value;
(b13) when the amount of transmitting is more than 5 envelopes, transmit unsuccessfully number and be more than 2 to 9 envelopes, the amount of transmitting is more than 20 envelopes, Mail Contents mate believable key word number more than 4 and when the same day transmits more than 4 envelope, are set to the 12nd credit value;
(b14) when transmitting number of times less than 5 envelopes, transmit unsuccessfully number and be more than 0 to 2 envelopes, mail size is at least 1 envelope more than 500k, when Mail Contents contain believable key word, is set to the 13rd credit value.
CN201510401224.8A 2015-07-10 2015-07-10 Sender reputation's generation method based on mail user behavior Active CN106341303B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510401224.8A CN106341303B (en) 2015-07-10 2015-07-10 Sender reputation's generation method based on mail user behavior

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510401224.8A CN106341303B (en) 2015-07-10 2015-07-10 Sender reputation's generation method based on mail user behavior

Publications (2)

Publication Number Publication Date
CN106341303A true CN106341303A (en) 2017-01-18
CN106341303B CN106341303B (en) 2019-05-21

Family

ID=57827106

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510401224.8A Active CN106341303B (en) 2015-07-10 2015-07-10 Sender reputation's generation method based on mail user behavior

Country Status (1)

Country Link
CN (1) CN106341303B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109801027A (en) * 2017-11-16 2019-05-24 阿里巴巴集团控股有限公司 Data processing method and device, server, storage medium
WO2019137290A1 (en) * 2018-01-09 2019-07-18 论客科技(广州)有限公司 Sender reputation value generation method and spam filtering method
CN110213152A (en) * 2018-05-02 2019-09-06 腾讯科技(深圳)有限公司 Identify method, apparatus, server and the storage medium of spam
CN110740089A (en) * 2018-07-20 2020-01-31 深信服科技股份有限公司 mass-sending spam detection method, device and equipment
CN114070644A (en) * 2021-11-26 2022-02-18 天翼数字生活科技有限公司 Junk mail intercepting method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1746916A (en) * 2005-10-25 2006-03-15 二六三网络通信股份有限公司 Network IP address credit assessment and use in electronic mail system
CN101136874A (en) * 2007-07-25 2008-03-05 华南理工大学 Compound decision based anti-rubbish E-mail error filtering method and system
CN101887523A (en) * 2010-06-21 2010-11-17 南京邮电大学 Method for detecting image spam email by picture character and local invariant feature
CN102103700A (en) * 2011-01-18 2011-06-22 南京邮电大学 Land mobile distance-based image spam similarity-detection method
CN102413076A (en) * 2011-12-22 2012-04-11 网易(杭州)网络有限公司 Spam mail judging system based on behavior analysis
CN103905289A (en) * 2012-12-26 2014-07-02 航天信息软件技术有限公司 Spam mail filtering method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1746916A (en) * 2005-10-25 2006-03-15 二六三网络通信股份有限公司 Network IP address credit assessment and use in electronic mail system
CN101136874A (en) * 2007-07-25 2008-03-05 华南理工大学 Compound decision based anti-rubbish E-mail error filtering method and system
CN101887523A (en) * 2010-06-21 2010-11-17 南京邮电大学 Method for detecting image spam email by picture character and local invariant feature
CN102103700A (en) * 2011-01-18 2011-06-22 南京邮电大学 Land mobile distance-based image spam similarity-detection method
CN102413076A (en) * 2011-12-22 2012-04-11 网易(杭州)网络有限公司 Spam mail judging system based on behavior analysis
CN103905289A (en) * 2012-12-26 2014-07-02 航天信息软件技术有限公司 Spam mail filtering method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张启宇: "基于贝叶斯算法的垃圾邮件过滤系统的研究与设计", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
李璇: "基于行为识别的垃圾邮件过滤技术的研究与应用", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109801027A (en) * 2017-11-16 2019-05-24 阿里巴巴集团控股有限公司 Data processing method and device, server, storage medium
WO2019137290A1 (en) * 2018-01-09 2019-07-18 论客科技(广州)有限公司 Sender reputation value generation method and spam filtering method
US11343213B2 (en) * 2018-01-09 2022-05-24 Lunkr Technology (Guangzhou) Co., Ltd. Method for generating reputation value of sender and spam filtering method
CN110213152A (en) * 2018-05-02 2019-09-06 腾讯科技(深圳)有限公司 Identify method, apparatus, server and the storage medium of spam
CN110740089A (en) * 2018-07-20 2020-01-31 深信服科技股份有限公司 mass-sending spam detection method, device and equipment
CN114070644A (en) * 2021-11-26 2022-02-18 天翼数字生活科技有限公司 Junk mail intercepting method and device, electronic equipment and storage medium
CN114070644B (en) * 2021-11-26 2024-04-02 天翼数字生活科技有限公司 Junk mail interception method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN106341303B (en) 2019-05-21

Similar Documents

Publication Publication Date Title
CN106341303A (en) Sender credibility generation method based on mail user behavior
US8713014B1 (en) Simplifying lexicon creation in hybrid duplicate detection and inductive classifier systems
US8214438B2 (en) (More) advanced spam detection features
US7660865B2 (en) Spam filtering with probabilistic secure hashes
US20070038705A1 (en) Trees of classifiers for detecting email spam
Saadat Survey on spam filtering techniques
US9596202B1 (en) Methods and apparatus for throttling electronic communications based on unique recipient count using probabilistic data structures
CN102842078A (en) Email forensic analyzing method based on community characteristics analysis
CN105337993A (en) Dynamic and static combination-based mail security detection device and method
CN102377690B (en) Anti-spam gateway system and method
CN101969411B (en) A kind of analysis-reduction method and system of non-encrypted WEB mail
US20080235798A1 (en) Method for filtering junk messages
CN107707462A (en) Spam emergency processing method based on cloud computing
CN105635080A (en) E-mail safety management system and method based on content filtering
CN101217555A (en) An intelligent anti-waster and anti-virus gateway and the corresponding filtering method
CN101795273B (en) Method and device for filtering junk mail
CN103595614A (en) User feedback based junk mail detection method
CN106230690B (en) A kind of process for sorting mailings and system of combination user property
CN101094197A (en) Method and mail server of anti garbage mail
CN110048936B (en) Method for judging junk mail by semantic associated words
CN107231287B (en) Marketing mail sending method and sending information statistical method and device
Paul et al. A privatised approach in enhanced spam filtering techniques using TSAS over cloud networks
CN113746814A (en) Mail processing method and device, electronic equipment and storage medium
CN106713108B (en) A kind of process for sorting mailings of combination customer relationship and bayesian theory
Karishma et al. Spam Detection using Recurrent Neural Networks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20181105

Address after: 518000 11, 41 floor, Guo Tong Building, 9023 Binhe Road, Futian District, Shenzhen, Guangdong.

Applicant after: Medium shift information technology Co., Ltd.

Applicant after: Polytron Technologies Inc

Address before: 518000 01-11, 4 floor, Changhong science and technology building, 18 Nanshan District science and technology south twelve Road, Shenzhen, Guangdong.

Applicant before: Polytron Technologies Inc

Applicant before: China Mobile Communication Group Guangdong Co., Ltd.

GR01 Patent grant
GR01 Patent grant