CN103812826A - Identification method, identification system, and filter system of spam mail - Google Patents

Identification method, identification system, and filter system of spam mail Download PDF

Info

Publication number
CN103812826A
CN103812826A CN201210442421.0A CN201210442421A CN103812826A CN 103812826 A CN103812826 A CN 103812826A CN 201210442421 A CN201210442421 A CN 201210442421A CN 103812826 A CN103812826 A CN 103812826A
Authority
CN
China
Prior art keywords
telex network
user
mail
spam
contact person
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201210442421.0A
Other languages
Chinese (zh)
Inventor
于洪涌
郭涛
张京晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN201210442421.0A priority Critical patent/CN103812826A/en
Publication of CN103812826A publication Critical patent/CN103812826A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention, which relates to the field of the anti-spam mail technology, discloses a user communication behavior-based identification method, identification system, and filter system of a spam mail. The method comprises the following steps that: e-mail addresses of an addresser and an addressee of a received e-mail are extracted; according to the e-mail addresses of the addresser and the addressee, whether a user communication coefficient of the addresser relatively to the addressee exists is searched in a user communication coefficient database, wherein the user communication coefficient is constructed based on the user communication behavior and expresses a connection degree between the addresser and the addressee; if the corresponding user communication coefficient exists, whether the e-mail is a spam mail for the addressee is determined according to the user communication coefficient. According to the user communication behavior-based spam mail identification scheme, the easy avoidance can be prevented; the individual communication relation of the user is fully considered; and the accuracy of identification and filtering of the spam mail is improved.

Description

Spam filtering method and recognition system and filtration system
Technical field
The present invention relates to anti-spam technologies field, particularly a kind of spam filtering method and recognition system and filtration system based on telex network behavior.
Background technology
Email is one of base application of current Internet user.In general SPAM refers to without user permits any Email in the mailbox that just sends to by force user.In December, 2010, Monitoring Data showed, the SPAM quantity of whole world transmission every day is about 50,000,000,000.The content of spam comprises promotional advertising, adult's advertisement, money-making information and comprises the destructive Emails such as computer virus.Therefore each large mail provider is all using promoting Email anti-rubbish mail effect as the significant concern point that promotes mailbox user experience.
The feature of traditional anti-garbage mail system based on mail itself, as defined mail features based on keyword or mail structure etc., then extract the related content of new mail, and the mail features of the content of extraction and definition is compared, according to matching degree identification spam.Can carry out intercept process for spam.
, mainly there is following shortcoming in above-mentioned traditional method based on the feature identification of mail spam own:
First, spammer by keyword change, the mode such as mail structure modify can evade easily, makes the accuracy of spam filtering lower;
Secondly, regard spam as for same envelope mail certain customers, certain customers may regard normal email as, and the Spam filtering mode that tradition is imposed uniformity without examining individual cases, does not consider the difference between user, may affect the normal use of Email User.
Summary of the invention
An embodiment of the present invention technical problem to be solved is: a kind of spam filtering method and recognition system and filtration system are provided, affect to solve traditional spam filtering accuracy filter type low and that impose uniformity without examining individual cases the problem that user uses.
A kind of spam filtering method providing according to the embodiment of the present invention aspect, comprising:
Extract the sender of Email and addressee's the e-mail address that receive; According to sender and addressee's e-mail address, in telex network library of factors, search and whether have the telex network coefficient of this sender with respect to this addressee, telex network coefficient builds based on telex network behavior, represents this addressee and this sender's degree of contact; If there is corresponding telex network coefficient, concerning this addressee, whether be spam according to Email described in described telex network parameter identification.
As a kind of exemplary execution mode, before receiving Email, also comprise the operation of setting up telex network behavior vector storehouse and telex network library of factors based on telex network behavioural information, specifically comprise: gather telex network behavioural information; Form telex network behavior vector according to the telex network behavioural information gathering, and telex network behavior vector is saved in telex network behavior vector storehouse; Form telex network coefficient according to telex network behavior vector, and telex network coefficient is saved in telex network library of factors.
As a kind of exemplary execution mode, the operation of described collection telex network behavioural information specifically comprises: receive the telex network behavioural information that user uploads by terminal, described telex network behavioural information comprises the corresponding relation of contact person's mailbox and telephone number, black and white lists data, voice communication data, short message communication data and E-mail communication data, described black and white lists data comprise user communication record black and white lists, described voice communication data comprises refusing to receive telephone, telephone relation frequency and initiatively dial number frequency, described short message communication data comprise short message communication frequency, initiatively send message frequency, be made as the telephone number of refuse messages, described E-mail communication data comprise mail transmission/reception frequency, initiatively send mail frequency, be made as the e-mail address of spam.
As a kind of exemplary execution mode, the described operation according to the telex network behavioural information formation telex network behavior vector gathering specifically comprises: the E-mail address and the telephone number that from telex network behavioural information, extract user, form the telex network behavior vector take this E-mail address as master index, and this user's telephone number is added in this telex network behavior vector; From telex network behavioural information, extract this user's contact person's E-mail address, form the subvector take contact person's E-mail address as this telex network behavior vector of secondary indexing; Form the component of subvector according to telex network behavioural information.
As a kind of exemplary execution mode, the described operation according to the component of telex network behavioural information formation subvector specifically comprises:
According to the black and white lists data in the telex network behavioural information gathering, generate black and white lists component, to represent that contact person is this user's black list user or white list user; According to the voice communication data in the telex network behavioural information gathering, generate speech components, to represent voice frequency and the response aggressiveness level of user to contact person's incoming call between user and contact person; According to the short message communication data in the telex network behavioural information gathering, generate note component, to represent that the note between user and contact person contacts the response aggressiveness level of frequency and the note of user to contact person; According to the E-mail communication data in the telex network behavioural information gathering, generate Email component, to represent that the mail between user and contact person contacts the response aggressiveness level of frequency and the mail of user to contact person.
As a kind of exemplary execution mode, the described operation according to telex network behavior vector formation telex network coefficient is specially: the each component under user and contact person's subcomponent in telex network behavior vector is weighted to summation operation according to weights separately; Determine telex network coefficient according to operation result, to represent user and this contact person's degree of contact.
As a kind of exemplary execution mode, whether described be that spam also comprises according to Email described in described telex network parameter identification concerning this addressee: mate and obtain matching result with the mail features of the spam setting in advance according to the keyword of described Email or mail structure; Comprehensively judge according to described telex network coefficient and described matching result, take determine described Email concerning this addressee whether as spam.
A kind of spam filtering system providing on the other hand according to the embodiment of the present invention, comprising:
Telex network behavioural information receiving element, the telex network behavioural information of uploading by terminal for receiving user; Telex network behavior vector storehouse, for form telex network behavior vector according to telex network behavioural information, the subvector of described telex network behavior vector represents this user's contact person, each subvector comprises each component, to reflect this contact person and the user situation that contacts in different communication mode; Telex network library of factors, for forming telex network coefficient according to telex network behavior vector, to represent user and this contact person's degree of contact; And spam integrated treatment unit, for extracting the sender of Email and addressee's the e-mail address of reception; According to sender and addressee's e-mail address, in telex network library of factors, search and whether have the telex network coefficient of this sender with respect to this addressee; If there is corresponding telex network coefficient, concerning this addressee, whether be spam according to Email described in described telex network parameter identification.
As a kind of exemplary execution mode, described telex network behavior vector storehouse comprises: vectorial forming unit, for extracting user's E-mail address and telephone number from telex network behavioural information, form the telex network behavior vector take this E-mail address as master index, and this user's telephone number is added in this telex network behavior vector; Subvector forming unit, for extract this user's contact person's E-mail address from telex network behavioural information, forms the subvector take contact person's E-mail address as this telex network behavior vector of secondary indexing; And component forming unit, for form the component of subvector according to telex network behavioural information.
As a kind of exemplary execution mode, described component forming unit comprises at least one following unit:
Black and white lists component forming unit, for according to the black and white lists data of the telex network behavioural information gathering, generates black and white lists component, to represent that contact person is this user's black list user or white list user; Speech components forming unit, for according to the voice communication data of the telex network behavioural information gathering, generates speech components, to represent voice frequency and the response aggressiveness level of user to contact person's incoming call between user and contact person; Note component forming unit, for according to the short message communication data of the telex network behavioural information gathering, generates note component, to represent that the note between user and contact person contacts the response aggressiveness level of frequency and the note of user to contact person; Email component forming unit, be used for according to the E-mail communication data of the telex network behavioural information gathering, generate Email component, to represent that the mail between user and contact person contacts the response aggressiveness level of frequency and the mail of user to contact person.
As a kind of exemplary execution mode, described telex network library of factors, also for mating and obtain matching result with the mail features of the spam setting in advance according to the keyword of described Email or mail structure; Comprehensively judge according to described telex network coefficient and described matching result, take determine described Email concerning this addressee whether as spam.
A kind of Spam Filtering System providing on the one hand again according to the embodiment of the present invention, comprising: above-mentioned spam filtering system, and Mail Transfer Agent system and mail delivery agency plant; Described Mail Transfer Agent system is for giving described spam filtering system by user's e-mail forward; Whether described spam filtering system is spam according to Email described in telex network parameter identification concerning addressee; Described mail delivery agency plant is delivered or is tackled described Email according to the recognition result of described spam filtering system.
As a kind of exemplary execution mode, described system also comprises traditional spam filtering system, and described Mail Transfer Agent system is also for giving described traditional spam filtering system by user's e-mail forward; Described traditional spam filtering system is for mating and obtain matching result with the mail features of the spam setting in advance according to the keyword of described Email or mail structure; Described spam filtering system is also for comprehensively judging in conjunction with telex network coefficient and described matching result, take determine described Email concerning this addressee whether as spam.
Scheme provided by the invention, build telex network behavior vector storehouse based on telex network behavioural information, and then structure telex network library of factors, follow-up in the time receiving Email, extract the sender of Email and addressee's e-mail address, according to sender and addressee's e-mail address, in telex network library of factors, search and whether have the telex network coefficient of this sender with respect to this addressee, if there is corresponding telex network coefficient, concerning this addressee, whether be spam according to telex network parameter identification Email, this spam filtering scheme based on telex network behavior should not be evaded, and take into full account the correspondence of user individual, improve the accuracy of spam filtering and filtration.In addition, can also comprehensively judge according to the recognition result of telex network coefficient and traditional spam filtering system, take determine Email concerning this addressee whether as spam, can further improve the accuracy of spam filtering and filtration.
By the detailed description to exemplary embodiment of the present invention referring to accompanying drawing, it is clear that further feature of the present invention and advantage thereof will become.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, to the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skills, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is the schematic flow sheet of an embodiment of spam filtering method of the present invention.
Fig. 2 is exemplary schematic flow sheet based on telex network behavior structure telex network coefficient of the present invention.
Fig. 3 is the schematic flow sheet of an exemplary formation telex network behavior vector of the present invention.
Fig. 4 is the schematic diagram of an exemplary telex network behavior vector of the present invention.
Fig. 5 is a complete schematic flow sheet of spam filtering of the present invention and filtration.
Fig. 6 is the structural representation of an embodiment of spam filtering system of the present invention.
Fig. 7 is the structural representation of another embodiment of spam filtering system of the present invention.
Fig. 8 is the structural representation of an embodiment of Spam Filtering System of the present invention.
Fig. 9 is the structural representation of another embodiment of Spam Filtering System of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, rather than whole embodiment.Illustrative to the description only actually of at least one exemplary embodiment below, never as any restriction to the present invention and application or use.Based on the embodiment in the present invention, those of ordinary skills, not making the every other embodiment obtaining under creative work prerequisite, belong to the scope of protection of the invention.
Unless illustrate in addition, otherwise the parts of setting forth in these embodiments and positioned opposite, numeral expression formula and the numerical value of step not limited the scope of the invention.
, it should be understood that for convenience of description, the size of the various piece shown in accompanying drawing is not to draw according to actual proportionate relationship meanwhile.
May not discuss in detail for the known technology of person of ordinary skill in the relevant, method and apparatus, but in suitable situation, described technology, method and apparatus should be regarded as authorizing a part for specification.
In all examples with discussing shown here, it is exemplary that any occurrence should be construed as merely, rather than as restriction.Therefore, other example of exemplary embodiment can have different values.
It should be noted that: in similar label and letter accompanying drawing below, represent similar terms, therefore, once be defined in an a certain Xiang Yi accompanying drawing, in accompanying drawing subsequently, do not need it to be further discussed.
Fig. 1 is the flow chart of an embodiment of spam filtering method of the present invention.As shown in Figure 1, the method for this embodiment comprises the following steps:
S102, extracts the sender of Email and addressee's the e-mail address that receive;
S104, according to sender and addressee's e-mail address, in telex network library of factors, search and whether have the telex network coefficient of this sender with respect to this addressee, telex network coefficient builds based on telex network behavior, represents this addressee and this sender's degree of contact;
S106, if there is corresponding telex network coefficient, whether concerning this addressee, is spam according to telex network parameter identification Email, telex network coefficient is lower, is that the possibility of spam is larger, therefore, certain threshold value can be set, if telex network coefficient lower than this threshold value, this Email is spam, otherwise this Email is non-spam.
Determining when spam, can also comprehensively judge according to the recognition result of telex network coefficient and traditional spam filtering system, take determine Email concerning this addressee whether as spam.
Provide the method for an exemplary comprehensive judgement spam below, be weighted summation operation by the recognition result of telex network coefficient and traditional spam filtering system according to weights separately; According to the comparative result of operation result and predetermined threshold value, determine whether this Email is spam concerning this addressee.
It should be noted that, traditional spam filtering system can adopt the method for the feature identification spam based on mail itself.For example, based on the definition such as keyword or mail structure mail features, then extract the related content (as extracted keyword or mail structure etc.) of new mail, and the content of extraction is mated with the mail features of definition, according to matching degree, whether identification is spam.Certainly, traditional spam filtering system can also adopt other known spam filtering methods, and the present invention does not limit.
Above-mentioned spam filtering method, whether be that spam is identified according to telex network coefficient to Email, this telex network coefficient builds based on telex network behavior, can represent this addressee and this sender's degree of contact, therefore, this spam filtering scheme based on telex network behavior should not be evaded, and takes into full account the correspondence of user individual, has improved the accuracy of spam filtering and filtration.
Provide an exemplary method based on telex network behavior structure telex network coefficient below, as shown in Figure 2, the method comprises the following steps:
S202, gathers telex network behavioural information, and wherein a kind of acquisition mode is to receive the telex network behavioural information that user uploads by its terminal.
Wherein, telex network behavioural information comprises corresponding relation, black and white lists data, voice communication data, short message communication data and the E-mail communication data etc. of contact person's mailbox and telephone number, but is not limited to this.Black and white lists data include but not limited to user communication record black and white lists.Voice communication data is according to including but not limited to refusing to receive telephone, telephone relation frequency and initiatively dialing number frequency.Short message communication data according to including but not limited to short message communication frequency, initiatively send message frequency, be made as the telephone number of refuse messages.E-mail communication data according to including but not limited to mail transmission/reception frequency, initiatively send mail frequency, be made as the e-mail address of spam.
S204, forms telex network behavior vector (User Communication Behavior Vector is called for short UCBV) according to the telex network behavioural information gathering.
As a kind of exemplary telex network behavior vector generation type, can from telex network behavioural information, extract user's E-mail address and telephone number, form the telex network behavior vector take this E-mail address as master index, and this user's telephone number is added in this telex network behavior vector; From telex network behavioural information, extract this user's contact person's E-mail address, form the subvector take contact person's E-mail address as this telex network behavior vector of secondary indexing; Form the component of subvector according to telex network behavioural information.
S206, forms telex network coefficient according to telex network behavior vector.
As a kind of exemplary telex network coefficient formation method, the each component under user and contact person's subcomponent in telex network behavior vector can be weighted to summation operation according to weights separately; Determine telex network coefficient according to operation result, to represent user and this contact person's degree of contact, for example, know well degree.Acceptance level etc.Operation result is larger, and telex network coefficient is larger, and this user and this contact person's degree of contact is tightr.
For example, telex network coefficient can be the numeral between 0-100,0 represents that user repels this contact person very much, put it into blacklist, its phone of rejection, its note is made as to refuse messages, its mail is made as to spam, 100 represent that users accept this contact person very much, put it into white list, initiatively carry out frequently the contact of voice SMS mail.Certainly, those skilled in the art can also arrange the represented implication of other grades as required, can also increase or reduce grade.
Above-mentioned based on after telex network behavior structure telex network coefficient, can adopt method shown in Fig. 1 to carry out the identification of spam.
Telex network behavior vector can be kept in telex network behavior vector storehouse.Provide a concrete implementation procedure that forms telex network behavior vector below, shown in Figure 3, comprise the following steps:
S302, after collecting telex network behavioural information, in telex network behavior vector storehouse, whether inquiry there is the vector take this user's E-mail address as master index.If there is vector, directly perform step S306.
S304, if there is not vector, sets up the telex network behavior vector take this user's E-mail address as master index, then performs step S306.
S306 extracts this user's contact person's E-mail address from telex network behavioural information, and in the telex network behavior vector take this user as master index, whether inquiry exists the subvector take contact person's E-mail address as secondary indexing.If there is subvector, directly perform step S310.
S308, if there is not subvector, sets up the subvector take this contact person's E-mail address as index, then performs step S310.
S310, according to each component of telex network behavioural information formation subvector.
Wherein, each component can comprise black and white lists component, speech components, note component, Email component etc., but is not limited to this.Introduce a kind of exemplary formation method of above-mentioned four kinds of components below.
For black and white lists component, can, according to the black and white lists data in the telex network behavioural information gathering, generate black and white lists component, to represent that contact person is this user's black list user or white list user.For example, a certain contact person of user is set to blacklist, artificial this user's of this contact of black and white lists component recording black list user.Again for example, a certain contact person of user is set to white list, artificial this user's of this contact of black and white lists component recording white list user.In the time of specific implementation, black and white lists component can be set to enumeration type, comprises blacklist and white list.
For speech components, can, according to the voice communication data in the telex network behavioural information gathering, generate speech components, to represent voice frequency and the response aggressiveness level of user to contact person's incoming call between user and contact person.For example, this speech components can be the numeral between 0-100,0 represents that user never makes a phone call to this contact person, and all incoming calls of this contact person of rejection, 100 represent that users (can preset a frequency threshold with frequency the most frequently, exceed this threshold value, represent frequency the most frequently) active and this Affiliate sessions, all answer in time for this contact person's incoming call.Certainly, those skilled in the art can also arrange the represented implication of other grades as required, can also increase or reduce grade.
For note component, can, according to the short message communication data in the telex network behavioural information gathering, generate note component, to represent that the note between user and contact person contacts the response aggressiveness level of frequency and the note of user to contact person.For example, this note component can be the numeral between 0-100,0 represents that user never sends short messages to this contact person, and the note that this contact person is sent is all made as refuse messages, 100 represent that users initiatively send note to this contact person with frequency the most frequently, all reply in time for this contact person's note.Certainly, those skilled in the art can also arrange the represented implication of other grades as required, can also increase or reduce grade.
For Email component, according to the E-mail communication data in the telex network behavioural information gathering, generate Email component, to represent that the mail between user and contact person contacts the response aggressiveness level of frequency and the mail of user to contact person.For example, this Email component can be the numeral between 0-100,0 represents that user is never to these contact human hair mail words, and all this contact person mails of sending are made as to spam, 100 represent that users initiatively send mail to this contact person with frequency the most frequently, and all mails of sending for this contact person are all replied in time.Certainly, those skilled in the art can also arrange the represented implication of other grades as required, can also increase or reduce grade.
Referring to Fig. 4, provide the schematic diagram in an exemplary telex network behavior vector storehouse (being called for short UCBVL) below.
In Fig. 4, UCBV[eMailAddress] represent a user's telex network behavior vector have n user, UCBV[eMailAddressn] the telex network behavior vector of expression user n.Wherein eMailAddress represents user's e-mail address, PhoneCodes represents that user's telephone number (uses phone number herein, user can have one or more phone numbers), set up the associated of user e-mail address and telephone number with PhoneCodes by eMailAddress.EMailAddress_r) expression contact person's e-mail address, user 1 has m contact person, and eMailAddress_rm represents the e-mail address of contact person m.User and contact person can determine a subvector, and this subvector has illustrated communication behavior unidirectional between user and contact person.This subvector comprises again BWList(black and white lists component), voice(voice communication component), SM(short message communication component), eMail(E-mail communication component) four components.
In addition, each e-mail address can form a vector, in this vector, there are user's oneself e-mail address and telephone number, for user A and B, the subvector finding as master index B as secondary indexing take A, may be different with the subvector that finds as master index A as secondary indexing take B, represent the difference of two-way of communication behavior between two users.
Fig. 5 is a complete schematic flow sheet of spam filtering of the present invention and filtration.As shown in Figure 5, the method for this embodiment comprises the following steps:
S501, gathers the telex network behavioural information between user and contact person, comprises user's e-mail address and the corresponding relation of telephone number, the communication behavior such as black and white lists setting, voice, note, mail between user and contact person.
S502, generates telex network behavior vector according to the telex network behavioural information gathering.
S503, according to the telex network behavioural information gathering, generic connection people subvector.
S504, according to the telex network behavioural information gathering, generates each component of subvector, and telex network behavior vector can be saved in telex network behavior vector storehouse.
S505, the contact person's subvector based on telex network behavior vector and each component thereof, generate telex network coefficient unidirectional between user and contact person, and telex network coefficient can be saved in telex network library of factors.
S506, in the time having new mail to arrive, extracts recipient email address, sender's e-mail address.
S507, according to the telex network coefficient of addresses of items of mail match query in telex network library of factors of the addressee who extracts and sender.
S508, carries out comprehensive distinguishing according to the differentiation result of telex network coefficient and traditional spam filtering system, for example, the recognition result of telex network coefficient and traditional spam filtering system is weighted to summation operation according to weights separately.
S509, compares with the suspicious spam threshold setting in advance differentiating result, determines whether mail docking addressee is spam.
Or, can also determine whether spam according to telex network coefficient, directly telex network coefficient and the suspicious spam threshold setting in advance are compared, determine whether mail docking addressee is spam.
S510, is not spam if differentiate, and normally delivers.
S511, if differentiate for spam, tackles.
In addition, can also be according to the correctness of the spam filtering result of user feedback, spam filtering process is optimized, for example, the generation of optimizing user communication behavior vector value, the each component of optimizing user communication behavior vector generates the weight of telex network coefficient, and optimizing user communication coefficient and traditional spam filtering system are differentiated the weight of conclusion etc. for spam.
Above-mentioned spam filtering and filter method, whether be that spam is identified according to telex network coefficient to Email, this telex network coefficient builds based on telex network behavior, can represent this addressee and this sender's degree of contact, therefore, this spam filtering scheme based on telex network behavior should not be evaded, and takes into full account the correspondence of user individual, has improved the accuracy of spam filtering and filtration.In addition, can also comprehensively judge according to the recognition result of telex network coefficient and traditional spam filtering system, take determine Email concerning this addressee whether as spam, can further improve the accuracy of spam filtering and filtration.
Fig. 6 is the structural representation of an embodiment of spam filtering system of the present invention.As shown in Figure 6, the spam filtering system of this embodiment, comprising:
Telex network behavioural information receiving element 602, the telex network behavioural information of uploading by terminal for receiving user; Telex network behavior vector storehouse 604, for form telex network behavior vector according to telex network behavioural information, the subvector of telex network behavior vector represents this user's contact person, and each subvector comprises each component, to reflect this contact person and the user situation that contacts in different communication mode; Telex network library of factors 606, for forming telex network coefficient according to telex network behavior vector, to represent user and this contact person's degree of contact; And spam integrated treatment unit 608, for extracting the sender of Email and addressee's the e-mail address of reception; According to sender and addressee's e-mail address, in telex network library of factors, search and whether have the telex network coefficient of this sender with respect to this addressee; If there is corresponding telex network coefficient, concerning this addressee, whether be spam according to telex network parameter identification Email.
Spam integrated treatment unit 608, also for comprehensively judging according to the recognition result of telex network coefficient and traditional spam filtering system, take determine Email concerning this addressee whether as spam.Wherein, traditional spam filtering system can be mated and obtain matching result with the mail features of the spam setting in advance according to the keyword of Email or mail structure.
As a kind of exemplary execution mode, telex network library of factors 606, specifically for being weighted summation operation by the each component under user and contact person's subcomponent in telex network behavior vector according to weights separately; Determine telex network coefficient according to operation result, to represent user and this contact person's degree of contact.
Fig. 7 is the structural representation of another embodiment of spam filtering system of the present invention.As shown in Figure 7, as a kind of exemplary execution mode, telex network behavior vector storehouse 604 comprises:
Vector forming unit 7042, for extracting user's E-mail address and telephone number from telex network behavioural information, form the telex network behavior vector take this E-mail address as master index, and this user's telephone number is added in this telex network behavior vector; Subvector forming unit 7044, for extract this user's contact person's E-mail address from telex network behavioural information, forms the subvector take contact person's E-mail address as this telex network behavior vector of secondary indexing; And component forming unit 7046, for form the component of subvector according to telex network behavioural information.
Wherein, component forming unit 7046 comprises at least one following unit:
Black and white lists component forming unit 7046a, for according to the black and white lists data of the telex network behavioural information gathering, generates black and white lists component, to represent that contact person is this user's black list user or white list user.
Speech components forming unit 7046b, for according to the voice communication data of the telex network behavioural information gathering, generates speech components, to represent voice frequency and the response aggressiveness level of user to contact person's incoming call between user and contact person.
Note component forming unit 7046c, for according to the short message communication data of the telex network behavioural information gathering, generates note component, to represent that the note between user and contact person contacts the response aggressiveness level of frequency and the note of user to contact person.
Email component forming unit 7046d, be used for according to the E-mail communication data of the telex network behavioural information gathering, generate Email component, to represent that the mail between user and contact person contacts the response aggressiveness level of frequency and the mail of user to contact person.
Above-mentioned spam filtering system, build telex network behavior vector storehouse based on telex network behavioural information, and then structure telex network library of factors, follow-up in the time receiving Email, extract the sender of Email and addressee's e-mail address, according to sender and addressee's e-mail address, in telex network library of factors, search and whether have the telex network coefficient of this sender with respect to this addressee, if there is corresponding telex network coefficient, concerning this addressee, whether be spam according to telex network parameter identification Email, this spam filtering scheme based on telex network behavior should not be evaded, and take into full account the correspondence of user individual, improve the accuracy of spam filtering and filtration.In addition, can also comprehensively judge according to the recognition result of telex network coefficient and traditional spam filtering system, take determine Email concerning this addressee whether as spam, can further improve the accuracy of spam filtering and filtration.
Fig. 8 is the structural representation of an embodiment of Spam Filtering System of the present invention.As shown in Figure 8, this system comprises: spam filtering system 802 provided by the invention, and Mail Transfer Agent system 804 and mail delivery agency plant 806;
Mail Transfer Agent system 804 for by user's e-mail forward to spam filtering system 802; Whether spam filtering system 802 is spam according to telex network parameter identification Email concerning addressee; Mail delivery agency plant 806 is delivered or is tackled Email according to the recognition result of spam filtering system 802.
As another kind of execution mode, as shown in Figure 9, this system also comprises traditional spam filtering system 908, and Mail Transfer Agent system 804 is also for giving traditional spam filtering system 908 by user's e-mail forward; Tradition spam filtering system 908 is for mating and obtain matching result with the mail features of the spam setting in advance according to the keyword of Email or mail structure; Spam filtering system 802 is also for comprehensively judging in conjunction with the matching result of telex network coefficient and traditional spam filtering system 908, take determine Email concerning this addressee whether as spam.
Wherein, traditional spam filtering system 908 can adopt the method for the feature identification spam based on mail itself.For example, based on the definition such as keyword or mail structure mail features, then extract the related content (for example extracting keyword or mail structure etc.) of new mail, and the content of extraction is mated with the mail features of definition, according to matching degree, whether identification is spam.Certainly, traditional spam filtering system can also adopt other known spam filtering methods, and the present invention does not limit.
Above-mentioned Spam Filtering System, build telex network coefficient based on telex network behavioural information, then concerning this addressee, whether be spam according to telex network parameter identification Email, this spam filtering scheme based on telex network behavior should not be evaded, and take into full account the correspondence of user individual, improved the accuracy of spam filtering and filtration.In addition, can also comprehensively judge according to the recognition result of telex network coefficient and traditional spam filtering system, take determine Email concerning this addressee whether as spam, can further improve the accuracy of spam filtering and filtration.
One of ordinary skill in the art will appreciate that all or part of step that realizes above-described embodiment can complete by hardware, also can carry out the hardware that instruction is relevant by program completes, described program can be stored in a kind of computer-readable recording medium, the above-mentioned storage medium of mentioning can be read-only memory, disk or CD etc.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (13)

1. a spam filtering method, comprising:
Extract the sender of Email and addressee's the e-mail address that receive;
According to sender and addressee's e-mail address, in telex network library of factors, search and whether have the telex network coefficient of this sender with respect to this addressee, telex network coefficient builds based on telex network behavior, represents this addressee and this sender's degree of contact;
If there is corresponding telex network coefficient, concerning this addressee, whether be spam according to Email described in described telex network parameter identification.
2. method according to claim 1, is characterized in that, before receiving described Email, also comprises the operation of setting up telex network behavior vector storehouse and telex network library of factors based on telex network behavioural information, specifically comprises:
Gather telex network behavioural information;
Form telex network behavior vector according to the telex network behavioural information gathering, and telex network behavior vector is saved in telex network behavior vector storehouse;
Form telex network coefficient according to telex network behavior vector, and telex network coefficient is saved in telex network library of factors.
3. method according to claim 2, is characterized in that, the operation of described collection telex network behavioural information specifically comprises:
Receive the telex network behavioural information that user uploads by terminal, described telex network behavioural information comprises the corresponding relation of contact person's mailbox and telephone number, black and white lists data, voice communication data, short message communication data and E-mail communication data, described black and white lists data comprise user communication record black and white lists, described voice communication data comprises refusing to receive telephone, telephone relation frequency and initiatively dial number frequency, described short message communication data comprise short message communication frequency, initiatively send message frequency, be made as the telephone number of refuse messages, described E-mail communication data comprise mail transmission/reception frequency, initiatively send mail frequency, be made as the e-mail address of spam.
4. method according to claim 2, is characterized in that, the described operation according to the telex network behavioural information formation telex network behavior vector gathering specifically comprises:
From telex network behavioural information, extract user's E-mail address and telephone number, form the telex network behavior vector take this E-mail address as master index, and this user's telephone number is added in this telex network behavior vector;
From telex network behavioural information, extract this user's contact person's E-mail address, form the subvector take contact person's E-mail address as the telex network behavior vector of secondary indexing;
Form the component of subvector according to telex network behavioural information.
5. method according to claim 4, is characterized in that, the described operation according to the component of telex network behavioural information formation subvector specifically comprises:
According to the black and white lists data in the telex network behavioural information gathering, generate black and white lists component, to represent that contact person is this user's black list user or white list user;
According to the voice communication data in the telex network behavioural information gathering, generate speech components, to represent voice frequency and the response aggressiveness level of user to contact person's incoming call between user and contact person;
According to the short message communication data in the telex network behavioural information gathering, generate note component, to represent that the note between user and contact person contacts the response aggressiveness level of frequency and the note of user to contact person;
According to the E-mail communication data in the telex network behavioural information gathering, generate Email component, to represent that the mail between user and contact person contacts the response aggressiveness level of frequency and the mail of user to contact person.
6. method according to claim 2, is characterized in that, the described operation according to telex network behavior vector formation telex network coefficient is specially:
Each component under user and contact person's subcomponent in telex network behavior vector is weighted to summation operation according to weights separately;
Determine telex network coefficient according to operation result, to represent user and this contact person's degree of contact.
7. method according to claim 1, is characterized in that, whether described be that spam also comprises according to Email described in described telex network parameter identification concerning this addressee:
Mate and obtain matching result with the mail features of the spam setting in advance according to the keyword of described Email or mail structure;
Comprehensively judge according to described telex network coefficient and described matching result, take determine described Email concerning this addressee whether as spam.
8. a spam filtering system, comprising:
Telex network behavioural information receiving element, the telex network behavioural information of uploading by terminal for receiving user;
Telex network behavior vector storehouse, for form telex network behavior vector according to telex network behavioural information, the subvector of described telex network behavior vector represents this user's contact person, each subvector comprises each component, to reflect this contact person and the user situation that contacts in different communication mode;
Telex network library of factors, for forming telex network coefficient according to telex network behavior vector, to represent user and this contact person's degree of contact; And
Spam integrated treatment unit, for extracting the sender of Email and addressee's the e-mail address of reception; According to sender and addressee's e-mail address, in telex network library of factors, search and whether have the telex network coefficient of this sender with respect to this addressee; If there is corresponding telex network coefficient, concerning this addressee, whether be spam according to Email described in described telex network parameter identification.
9. system according to claim 8, is characterized in that, described telex network behavior vector storehouse comprises:
Vector forming unit, for extracting user's E-mail address and telephone number from telex network behavioural information, form the telex network behavior vector take this E-mail address as master index, and this user's telephone number is added in this telex network behavior vector;
Subvector forming unit, for extract this user's contact person's E-mail address from telex network behavioural information, forms the subvector take contact person's E-mail address as this telex network behavior vector of secondary indexing; And
Component forming unit, for forming the component of subvector according to telex network behavioural information.
10. system according to claim 9, is characterized in that, described component forming unit comprises at least one following unit:
Black and white lists component forming unit, for according to the black and white lists data of the telex network behavioural information gathering, generates black and white lists component, to represent that contact person is this user's black list user or white list user;
Speech components forming unit, for according to the voice communication data of the telex network behavioural information gathering, generates speech components, to represent voice frequency and the response aggressiveness level of user to contact person's incoming call between user and contact person;
Note component forming unit, for according to the short message communication data of the telex network behavioural information gathering, generates note component, to represent that the note between user and contact person contacts the response aggressiveness level of frequency and the note of user to contact person;
Email component forming unit, be used for according to the E-mail communication data of the telex network behavioural information gathering, generate Email component, to represent that the mail between user and contact person contacts the response aggressiveness level of frequency and the mail of user to contact person.
11. systems according to claim 8, is characterized in that, described telex network library of factors, also for mating and obtain matching result with the mail features of the spam setting in advance according to the keyword of described Email or mail structure; Comprehensively judge according to described telex network coefficient and described matching result, take determine described Email concerning this addressee whether as spam.
12. 1 kinds of Spam Filtering Systems, comprising: the spam filtering system as described in claim 8-11 any one, and Mail Transfer Agent system and mail delivery agency plant;
Described Mail Transfer Agent system is for giving described spam filtering system by user's e-mail forward;
Whether described spam filtering system is spam according to Email described in telex network parameter identification concerning addressee;
Described mail delivery agency plant is delivered or is tackled described Email according to the recognition result of described spam filtering system.
13. systems according to claim 12, is characterized in that, described system also comprises traditional spam filtering system,
Described Mail Transfer Agent system is also for giving described traditional spam filtering system by user's e-mail forward;
Described traditional spam filtering system is for mating and obtain matching result with the mail features of the spam setting in advance according to the keyword of described Email or mail structure;
Described spam filtering system is also for comprehensively judging in conjunction with telex network coefficient and described matching result, take determine described Email concerning this addressee whether as spam.
CN201210442421.0A 2012-11-08 2012-11-08 Identification method, identification system, and filter system of spam mail Pending CN103812826A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210442421.0A CN103812826A (en) 2012-11-08 2012-11-08 Identification method, identification system, and filter system of spam mail

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210442421.0A CN103812826A (en) 2012-11-08 2012-11-08 Identification method, identification system, and filter system of spam mail

Publications (1)

Publication Number Publication Date
CN103812826A true CN103812826A (en) 2014-05-21

Family

ID=50709033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210442421.0A Pending CN103812826A (en) 2012-11-08 2012-11-08 Identification method, identification system, and filter system of spam mail

Country Status (1)

Country Link
CN (1) CN103812826A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104243501A (en) * 2014-10-14 2014-12-24 四川神琥科技有限公司 Filtering and intercepting method for junk mail
CN105847123A (en) * 2016-04-19 2016-08-10 乐视控股(北京)有限公司 Spam mail recognition method and device
CN106685796A (en) * 2016-06-29 2017-05-17 腾讯科技(深圳)有限公司 Information identification method, device and system
CN106941440A (en) * 2016-01-04 2017-07-11 五八同城信息技术有限公司 A kind of session anti-clutter method and device
CN107104887A (en) * 2017-06-01 2017-08-29 珠海格力电器股份有限公司 A kind of instant message based reminding method, device and its user terminal
CN108270932A (en) * 2016-12-30 2018-07-10 中国移动通信集团公司 A kind of recognition methods of communicating number and device
CN108429672A (en) * 2018-05-29 2018-08-21 深圳邮信互联软件信息平台有限公司 mail receiving method and device
CN109391535A (en) * 2017-08-02 2019-02-26 阿里巴巴集团控股有限公司 The contact person of domain grade determines method, spam judgment method and device
CN110213152A (en) * 2018-05-02 2019-09-06 腾讯科技(深圳)有限公司 Identify method, apparatus, server and the storage medium of spam
CN110661750A (en) * 2018-06-28 2020-01-07 深信服科技股份有限公司 Mail sender identity detection method, system, equipment and storage medium
CN110838972A (en) * 2019-11-29 2020-02-25 北京春笛网络信息技术服务有限公司 Spam filtering method based on friend list

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101087259A (en) * 2006-06-07 2007-12-12 深圳市都护网络科技有限公司 A system for filtering spam in Internet and its implementation method
CN101588558A (en) * 2009-03-30 2009-11-25 网易(杭州)网络有限公司 Spam filtering method and system
CN101640647A (en) * 2008-07-31 2010-02-03 中兴通讯股份有限公司 Email sending service device, email sending service system and email sending method
CN101674264A (en) * 2009-10-20 2010-03-17 哈尔滨工程大学 Spam detection device and method based on user relationship mining and credit evaluation
US7693945B1 (en) * 2004-06-30 2010-04-06 Google Inc. System for reclassification of electronic messages in a spam filtering system
CN101771966A (en) * 2010-03-11 2010-07-07 上海粱江通信系统股份有限公司 Keywords and frequency based method for identifying spam message sources
CN101925020A (en) * 2009-06-15 2010-12-22 北京华智大为科技有限公司 Method and system for binding E-mail addresses and mobile phone number
CN101977360A (en) * 2010-09-30 2011-02-16 北京新媒传信科技有限公司 Junk short message filter method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7693945B1 (en) * 2004-06-30 2010-04-06 Google Inc. System for reclassification of electronic messages in a spam filtering system
CN101087259A (en) * 2006-06-07 2007-12-12 深圳市都护网络科技有限公司 A system for filtering spam in Internet and its implementation method
CN101640647A (en) * 2008-07-31 2010-02-03 中兴通讯股份有限公司 Email sending service device, email sending service system and email sending method
CN101588558A (en) * 2009-03-30 2009-11-25 网易(杭州)网络有限公司 Spam filtering method and system
CN101925020A (en) * 2009-06-15 2010-12-22 北京华智大为科技有限公司 Method and system for binding E-mail addresses and mobile phone number
CN101674264A (en) * 2009-10-20 2010-03-17 哈尔滨工程大学 Spam detection device and method based on user relationship mining and credit evaluation
CN101771966A (en) * 2010-03-11 2010-07-07 上海粱江通信系统股份有限公司 Keywords and frequency based method for identifying spam message sources
CN101977360A (en) * 2010-09-30 2011-02-16 北京新媒传信科技有限公司 Junk short message filter method

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104243501A (en) * 2014-10-14 2014-12-24 四川神琥科技有限公司 Filtering and intercepting method for junk mail
CN104243501B (en) * 2014-10-14 2017-04-12 四川神琥科技有限公司 Filtering and intercepting method for junk mail
CN106941440B (en) * 2016-01-04 2020-09-01 五八同城信息技术有限公司 Session anti-harassment method and device
CN106941440A (en) * 2016-01-04 2017-07-11 五八同城信息技术有限公司 A kind of session anti-clutter method and device
CN105847123A (en) * 2016-04-19 2016-08-10 乐视控股(北京)有限公司 Spam mail recognition method and device
CN106685796B (en) * 2016-06-29 2018-09-04 腾讯科技(深圳)有限公司 A kind of information identifying method, device and system
CN106685796A (en) * 2016-06-29 2017-05-17 腾讯科技(深圳)有限公司 Information identification method, device and system
CN108270932A (en) * 2016-12-30 2018-07-10 中国移动通信集团公司 A kind of recognition methods of communicating number and device
CN107104887A (en) * 2017-06-01 2017-08-29 珠海格力电器股份有限公司 A kind of instant message based reminding method, device and its user terminal
CN109391535A (en) * 2017-08-02 2019-02-26 阿里巴巴集团控股有限公司 The contact person of domain grade determines method, spam judgment method and device
CN109391535B (en) * 2017-08-02 2022-03-04 阿里巴巴集团控股有限公司 Domain-level contact person determining method, and junk mail judging method and device
CN110213152A (en) * 2018-05-02 2019-09-06 腾讯科技(深圳)有限公司 Identify method, apparatus, server and the storage medium of spam
CN110213152B (en) * 2018-05-02 2021-09-14 腾讯科技(深圳)有限公司 Method, device, server and storage medium for identifying junk mails
CN108429672A (en) * 2018-05-29 2018-08-21 深圳邮信互联软件信息平台有限公司 mail receiving method and device
CN110661750A (en) * 2018-06-28 2020-01-07 深信服科技股份有限公司 Mail sender identity detection method, system, equipment and storage medium
CN110838972A (en) * 2019-11-29 2020-02-25 北京春笛网络信息技术服务有限公司 Spam filtering method based on friend list

Similar Documents

Publication Publication Date Title
CN103812826A (en) Identification method, identification system, and filter system of spam mail
US10778624B2 (en) Systems and methods for spam filtering
US20180131652A1 (en) Spam filtering and person profiles
CN104883671B (en) A kind of judgment method and system of refuse messages
CN103095746A (en) Method and device capable of sending information to group users through microblog
WO2004105332A9 (en) Method and apparatus for filtering email spam based on similarity measures
CN103391547A (en) Information processing method and terminal
CN110705926A (en) Method, device and system for acquiring logistics object distribution information
CN104994209A (en) Contact information obtaining method based on communication software chatting records and system
CN101389085B (en) Rubbish short message recognition system and method based on sending behavior
CN101588558B (en) Spam filtering method and system
CN105635080A (en) E-mail safety management system and method based on content filtering
CN111010336A (en) Massive mail analysis method and device
CN103796184A (en) Spam short message recognition method and system
US9288173B2 (en) Geo-data spam filter
Lv et al. Spam filter based on naive Bayesian classifier
CN105025489A (en) Method for automatically shielding junk short messages
CN106230690B (en) A kind of process for sorting mailings and system of combination user property
CN1744123A (en) Method for filtering group-transmitted rubbish mails
CN102421073A (en) Method and device for processing short message
KR100473051B1 (en) Automatic Spam-mail Dividing Method
CN110300383A (en) A kind of filtering junk short messages programmed algorithm and device and system and storage medium
CN109218162B (en) Mail delivery method and device
CN102710550B (en) The method of output state prompting message, client and system in instant messaging
CN101184262A (en) Mobile message receive and reject method using mobile message receive system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20140521