CN101771966A - Keywords and frequency based method for identifying spam message sources - Google Patents

Keywords and frequency based method for identifying spam message sources Download PDF

Info

Publication number
CN101771966A
CN101771966A CN 201010121687 CN201010121687A CN101771966A CN 101771966 A CN101771966 A CN 101771966A CN 201010121687 CN201010121687 CN 201010121687 CN 201010121687 A CN201010121687 A CN 201010121687A CN 101771966 A CN101771966 A CN 101771966A
Authority
CN
China
Prior art keywords
keyword
source
thresholds
max
note
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201010121687
Other languages
Chinese (zh)
Other versions
CN101771966B (en
Inventor
肖克华
伍贳跟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LIANGJIANG COMMUNICATIONS SYSTEM CO Ltd
Original Assignee
LIANGJIANG COMMUNICATIONS SYSTEM CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by LIANGJIANG COMMUNICATIONS SYSTEM CO Ltd filed Critical LIANGJIANG COMMUNICATIONS SYSTEM CO Ltd
Priority to CN 201010121687 priority Critical patent/CN101771966B/en
Publication of CN101771966A publication Critical patent/CN101771966A/en
Application granted granted Critical
Publication of CN101771966B publication Critical patent/CN101771966B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a keywords and frequency based method for identifying spam message sources. The method comprises the following steps: setting a keyword message frequency threshold which is the maximum threshold value violating the same keyword in a period, and blocking messages of a calling source in accordance to the keyword when the number of messages transmitted by the calling source violating the same keyword exceeds the maximum threshold value of the keyword message frequency threshold; and setting a blacklist threshold, namely setting the maximum threshold value which is the number of different keywords of the calling source exceeding the keyword message frequency threshold, determining the calling source to be a spam message source when the number of the calling source violating different keywords exceeds the maximum threshold value, and blocking all messages of the calling source. The method can effectively block spam messages by adopting a mode of combining keywords with frequency and listing sources violating a plurality of keywords simultaneously in the blacklist according to the sending characteristics of spam messages.

Description

A kind of method based on keyword and frequency identification refuse messages source
Technical field
The present invention relates to field of telecommunications, relate in particular to the monitoring and the improvement of refuse messages, is a kind of method in conjunction with keyword and transmission frequency identification refuse messages source.
Background technology
SMS (Short Message Service) is as a kind of basic service of mobile communications network, and when convenient message communicating service was provided for the user, also the propagation for garbage information provided channel.And rubbish short message has the trend that grows in intensity, and refuse messages not only brings the harmful effect of customer complaint, also has the malicious owing fee problem, therefore need monitor interception in real time to refuse messages.
The transmission feature of refuse messages has: 1, send the frequency than higher, 2, content repeats, and mostly is swindle or advertisement, meets key characteristics.
The means of current refuse messages Feed Discovery are mostly based on pure frequency statistics and keyword.Frequency value is provided with excessive, can leak the refuse messages source of blocking; Frequency value is provided with too small, causes elam error rate higher; It is simple that keyword is provided with condition, causes mistake to block easily; The keyword complexity can not cover fully.
Summary of the invention
The objective of the invention is to overcome the defective of prior art and a kind of method based on keyword and frequency identification refuse messages source is provided, this method is according to the transmission feature of refuse messages, be that normal users can not send the content of violating different keywords in a large number, the method that adopts the analysis of key word note and the frequency, different keyword violation number of times to combine identifies junk short message user's method, limit this source and send note, this method is a kind of method of effective catching rubbish note.
The technical scheme that realizes above-mentioned purpose is: a kind of method based on keyword and frequency identification refuse messages source, wherein,
Set keyword message frequency thresholding, this keyword message frequency thresholding is a max-thresholds of violating same keyword in a period of time, when the note quantity of the same keyword of violation that sends when the caller source in the time range of setting exceeds the max-thresholds of this keyword message frequency thresholding, then this caller source note of meeting this keyword will be blocked;
Set the blacklist thresholding, promptly the caller source exceeds the max-thresholds of the different keyword numbers of keyword message frequency thresholding in the time range of setting, when the number of the different keyword of caller source violation in the time range of setting exceeds this max-thresholds, judge that then this caller source is the refuse messages source, tackle all notes in this caller source.
The above-mentioned method based on keyword and frequency identification refuse messages source wherein, comprises the following steps:
Step S1. sets the keyword definition table, this keyword definition table comprise keyword with and pairing keyword code;
Step S2. sets the threshold condition A of keyword message frequency thresholding, and promptly to send the max-thresholds of the note of the same keyword of violation in time period P be M1 in the caller source, and wherein, P is a positive number, and M1 is a positive integer;
Step S3. sets the threshold condition B of blacklist thresholding, i.e. the max-thresholds of the different keyword numbers of caller source violation threshold condition A is M2, and wherein, M2 is a positive integer;
Step S4. receives note;
Step S5. judges whether the short message content that receives violates keyword,
If do not violate keyword, then return step S4;
If violate keyword, then enter step S6;
Step S6. deposits the note formation in, be the calling number of violating the caller source of keyword note among the recording step S5, the transmitting time that this caller source sends this note and the pairing keyword code of violating of keyword, these three data are deposited in the note formation;
Step S7. judges whether the count value of this keyword of this calling number has reached threshold condition A in time period P, be that whether the note formation that crucial word code is arranged in caller source has reached in the count value of the same keyword of time period P whether surpass max-thresholds M1 among the threshold condition A among the determining step S6
If do not surpass the max-thresholds M1 among the threshold condition A, then continue monitoring, return step S4;
If surpass the max-thresholds M1 among the threshold condition A, then enter step S8 and further judge;
Step S8. judges whether this caller source reaches threshold condition B, whether the caller source that reaches max-thresholds M1 among the determining step S7 has reached threshold condition B, be whether the number of times of the caller source different keywords that reach threshold condition A surpasses the max-thresholds M2 among the threshold condition B
If do not exceed max-thresholds M2, then this caller source does not pipe off, only write down this caller source and
The corresponding relation of its keyword forwards step S9 to the note formation;
If exceed max-thresholds M2, then this caller source pipes off, and forwards step S10 to;
Step S9. tackles the note that meets keyword in this caller source, returns step S4 then;
Step S10. pipes off, and promptly this caller source is the refuse messages source, pipes off;
All notes in the caller source during step S11. interception pipes off.
The invention has the beneficial effects as follows: the present invention is by the method based on keyword and frequency identification refuse messages source, when the short message content of the transmission in caller source is violated the keyword threshold value that special key words and excess flow set, only tackle the user and violate the note of the keyword exceed the keyword threshold value, and the number of violating different keywords simultaneously as the user is when reaching keyword blacklist threshold value, judge that just the caller source is the refuse messages source, tackle all notes in this source, get rid of normal message sources note and violate the situation of keyword once in a while, reduce the intercepting rubbish short message elam error rate.Therefore the present invention can identify the note that meets keyword, can reduce elam error rate again, accurately discerns the refuse messages source.
Description of drawings
Fig. 1 is of the present invention by the flow chart based on an embodiment of the method in keyword and frequency identification refuse messages source;
Fig. 2 be one embodiment of the invention according to the keyword definition hoist pennants;
Fig. 3 is the schematic diagram of the note formation that crucial word code is arranged of one embodiment of the invention;
Fig. 4 is the schematic diagram according to the statistics formation of caller and keyword of one embodiment of the invention.
Embodiment
A kind of method based on keyword and frequency identification refuse messages source comprises:
Set the corresponding relation of keyword and keyword code;
Set keyword message frequency thresholding, this keyword message frequency thresholding is a max-thresholds of violating same keyword in a period of time;
Set the blacklist thresholding, promptly the caller source exceeds the max-thresholds of the different keyword numbers of keyword message frequency thresholding in the time range of setting, and exceeds this max-thresholds, is put on the blacklist and tackles;
Receive note, judge whether to violate the keyword of setting, record caller source sends the note that content is violated keyword;
When the note quantity of the same keyword of violation that sends when the caller source in the time range of setting exceeds the max-thresholds of this keyword message frequency thresholding, then this caller source note of meeting this keyword will be blocked;
The different keyword numbers that tackle simultaneously in this caller source in the time range of setting exceed the max-thresholds in the blacklist thresholding, think that this caller source is the refuse messages source, tackle all notes in this caller source.
The invention will be further described below in conjunction with an embodiment.
See also Fig. 1, may further comprise the steps among this embodiment:
Step S1. sets the keyword definition table, sees also Fig. 2, this keyword definition table comprise keyword with and pairing keyword code;
Step S2. sets the threshold condition A of keyword message frequency thresholding, be that the caller source sends the note of violating same keyword in time period P max-thresholds is M1, wherein, P is a positive number, and M1 is a positive integer, in the present embodiment, be provided with the keyword of keyword code 501/502/503 representative, P=1 days, the M1=5 bar, promptly max-thresholds is that M1 is 5 same keyword note/skies;
Step S3. sets the threshold condition B of blacklist thresholding, i.e. the max-thresholds of the different keywords of caller source violation threshold condition A is M2, and wherein, M2 is positive integer (PLSCONFM herein), in the present embodiment, and M2=3, i.e. 3 keyword/skies;
Step S4. receives note;
Whether the short message content that receives among the step S5. determining step S4 violates keyword,
If do not violate keyword, then return step S4;
If violate keyword, then enter step S6;
Step S6. deposits the note formation in, be the calling number of violating the caller source of keyword note among the recording step S5, the transmitting time that this caller source sends this note and the pairing keyword code of violating of keyword, these three data are deposited in the note formation, see also Fig. 3, show the note formation that this has crucial word code;
Step S7. judges whether the count value of this keyword of this calling number has reached threshold condition A in time period P, be that whether the note formation that crucial word code is arranged in caller source has reached in the count value of the same keyword of time period P whether surpass max-thresholds M1 among the threshold condition A among the determining step S6
If do not surpass the max-thresholds M1 among the threshold condition A, then continue monitoring, return step S4;
If surpass the max-thresholds M1 among the threshold condition A, then enter step S8 and further judge;
Step S8. judges whether this caller source reaches threshold condition B, whether the caller source that reaches max-thresholds M1 among the determining step S7 has reached threshold condition B, be whether the number of times of the caller source different keywords that reach threshold condition A surpasses the max-thresholds M2 among the threshold condition B, see also shown in Figure 4
If do not exceed max-thresholds M2, then this caller source does not pipe off, only write down this caller source and
The corresponding relation of its keyword forwards step S9 to the note formation;
If exceed max-thresholds M2, then this caller source pipes off, and forwards step S10 to;
Step S9. tackles the note that meets keyword in this caller source when not exceeding max-thresholds M2, return step S4 then;
Step S10. pipes off, and as shown in Figure 4, having reached max-thresholds is M2 (3 keyword/sky), and promptly this caller source is the refuse messages source, pipes off;
All notes in the caller source during step S11. interception pipes off.
In sum, the present invention is a kind of method based on keyword and frequency identification refuse messages source, and it is with keyword and frequency combined calculation, does not reach the keyword of keyword note max-thresholds for the source, does not carry out the pairing interception of source and keyword; For the source that reaches keyword note max-thresholds, carry out source and keyword pairing interception; Judge again whether this source reaches the blacklist max-thresholds,, only tackle its corresponding keyword for the source that does not reach max-thresholds; For the source that reaches the blacklist max-thresholds, tackle all notes in this source.Adopt this method accurately to identify the refuse messages source.
Adopt this method for the note optimization system, can reduce the elam error rate of refuse messages, improved the degree of hitting of refuse messages identification.
Below embodiment has been described in detail the present invention in conjunction with the accompanying drawings, and those skilled in the art can make the many variations example to the present invention according to the above description.Thereby some details among the embodiment should not constitute limitation of the invention, and the scope that the present invention will define with appended claims is as protection scope of the present invention.

Claims (2)

1. the method based on keyword and frequency identification refuse messages source is characterized in that,
Set keyword message frequency thresholding, this keyword message frequency thresholding is a max-thresholds of violating same keyword in a period of time, when the note quantity of the same keyword of violation that sends when the caller source in the time range of setting exceeds the max-thresholds of this keyword message frequency thresholding, then this caller source note of meeting this keyword will be blocked;
Set the blacklist thresholding, promptly the caller source exceeds the max-thresholds of the different keyword numbers of keyword message frequency thresholding in the time range of setting, when the number of the different keyword of caller source violation in the time range of setting exceeds this max-thresholds, judge that then this caller source is the refuse messages source, tackle all notes in this caller source.
2. the method based on keyword and frequency identification refuse messages source according to claim 1 is characterized in that, comprises the following steps:
Step S1. sets the keyword definition table, this keyword definition table comprise keyword with and pairing keyword code;
Step S sets the threshold condition A of keyword message frequency thresholding, and promptly to send the max-thresholds of the note of the same keyword of violation in time period P be M1 in the caller source, and wherein, P is a positive number, and M1 is a positive integer;
Step S3. sets the threshold condition B of blacklist thresholding, i.e. the max-thresholds of the different keyword numbers of caller source violation threshold condition A is M2, and wherein, M2 is a positive integer;
Step S4. receives note;
Step S5. judges whether the short message content that receives violates keyword,
If do not violate keyword, then return step S4;
If violate keyword, then enter step S6;
Step S6. deposits the note formation in, be the calling number of violating the caller source of keyword note among the recording step S5, the transmitting time that this caller source sends this note and the pairing keyword code of violating of keyword, these three data are deposited in the note formation;
Step S7. judges whether the count value of this keyword of this calling number has reached threshold condition A in time period P, be that whether the note formation that crucial word code is arranged in caller source has reached in the count value of the same keyword of time period P whether surpass max-thresholds M1 among the threshold condition A among the determining step S6
If do not surpass the max-thresholds M1 among the threshold condition A, then continue monitoring, return step S4;
If surpass the max-thresholds M1 among the threshold condition A, then enter step S8 and further judge;
Step S8. judges whether this caller source reaches threshold condition B, whether the caller source that reaches max-thresholds M1 among the determining step S7 has reached threshold condition B, be whether the number of times of the caller source different keywords that reach threshold condition A surpasses the max-thresholds M2 among the threshold condition B
If do not exceed max-thresholds M2, then this caller source does not pipe off, and the corresponding relation that only writes down this caller source and its keyword forwards step S9 to the note formation;
If exceed max-thresholds M2, then this caller source pipes off, and forwards step S 0 to;
Step S9. tackles the note that meets keyword in this caller source, returns step S4 then;
Step S10. pipes off, and promptly this caller source is the refuse messages source, pipes off;
All notes in the caller source during step S11. interception pipes off.
CN 201010121687 2010-03-11 2010-03-11 Keywords and frequency based method for identifying spam message sources Expired - Fee Related CN101771966B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010121687 CN101771966B (en) 2010-03-11 2010-03-11 Keywords and frequency based method for identifying spam message sources

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010121687 CN101771966B (en) 2010-03-11 2010-03-11 Keywords and frequency based method for identifying spam message sources

Publications (2)

Publication Number Publication Date
CN101771966A true CN101771966A (en) 2010-07-07
CN101771966B CN101771966B (en) 2013-01-23

Family

ID=42504494

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010121687 Expired - Fee Related CN101771966B (en) 2010-03-11 2010-03-11 Keywords and frequency based method for identifying spam message sources

Country Status (1)

Country Link
CN (1) CN101771966B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102480702A (en) * 2010-11-24 2012-05-30 腾讯科技(深圳)有限公司 Short message intercepting method and system
CN102843343A (en) * 2011-06-23 2012-12-26 腾讯科技(深圳)有限公司 System and device for controlling junk information in network
CN102890688A (en) * 2011-07-22 2013-01-23 腾讯科技(深圳)有限公司 Method and device for detecting automatic submitted content
CN103188635A (en) * 2011-12-29 2013-07-03 上海粱江通信系统股份有限公司 Method for recognizing junk short message source based on frequency and called distribution rules
CN103188682A (en) * 2011-12-30 2013-07-03 中国移动通信集团吉林有限公司 Method and device for controlling communication number sending junk messages
CN103812826A (en) * 2012-11-08 2014-05-21 中国电信股份有限公司 Identification method, identification system, and filter system of spam mail
CN104735048A (en) * 2014-12-02 2015-06-24 北京奇虎科技有限公司 Method and device for monitoring issued information in game
CN105426405A (en) * 2015-10-29 2016-03-23 维沃移动通信有限公司 Information processing method and mobile terminal
WO2016177148A1 (en) * 2015-08-18 2016-11-10 中兴通讯股份有限公司 Short message interception method and device
CN106941440A (en) * 2016-01-04 2017-07-11 五八同城信息技术有限公司 A kind of session anti-clutter method and device
CN111556109A (en) * 2020-04-17 2020-08-18 北京达佳互联信息技术有限公司 Request processing method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060019684A1 (en) * 2004-07-22 2006-01-26 Xiao-Qin Yu Short message filter mechanism and communication device
CN1905408A (en) * 2006-08-04 2007-01-31 华为技术有限公司 Method and apparatus for monitoring message
CN101335920A (en) * 2008-07-15 2008-12-31 中国联合通信有限公司 Rubbish short message recognition system and method based on calling number location and transmitted content
CN101616397A (en) * 2009-07-20 2009-12-30 中兴通讯股份有限公司 A kind of supervisory control system of inter-provincial roaming message and implementation method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060019684A1 (en) * 2004-07-22 2006-01-26 Xiao-Qin Yu Short message filter mechanism and communication device
CN1905408A (en) * 2006-08-04 2007-01-31 华为技术有限公司 Method and apparatus for monitoring message
CN101335920A (en) * 2008-07-15 2008-12-31 中国联合通信有限公司 Rubbish short message recognition system and method based on calling number location and transmitted content
CN101616397A (en) * 2009-07-20 2009-12-30 中兴通讯股份有限公司 A kind of supervisory control system of inter-provincial roaming message and implementation method

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102480702A (en) * 2010-11-24 2012-05-30 腾讯科技(深圳)有限公司 Short message intercepting method and system
CN102843343B (en) * 2011-06-23 2016-04-20 腾讯科技(深圳)有限公司 A kind of system, frequency control apparatus and service server controlling junk information in a network
CN102843343A (en) * 2011-06-23 2012-12-26 腾讯科技(深圳)有限公司 System and device for controlling junk information in network
CN102890688A (en) * 2011-07-22 2013-01-23 腾讯科技(深圳)有限公司 Method and device for detecting automatic submitted content
CN102890688B (en) * 2011-07-22 2018-01-02 深圳市世纪光速信息技术有限公司 A kind of automatic detection method and device for submitting content
CN103188635B (en) * 2011-12-29 2017-08-08 上海粱江通信系统股份有限公司 A kind of method that junk short message source is recognized based on the frequency and called distribution rule
CN103188635A (en) * 2011-12-29 2013-07-03 上海粱江通信系统股份有限公司 Method for recognizing junk short message source based on frequency and called distribution rules
CN103188682B (en) * 2011-12-30 2016-05-25 中国移动通信集团吉林有限公司 A kind of method and device of controlling the communicating number that sends rubbish message
CN103188682A (en) * 2011-12-30 2013-07-03 中国移动通信集团吉林有限公司 Method and device for controlling communication number sending junk messages
CN103812826A (en) * 2012-11-08 2014-05-21 中国电信股份有限公司 Identification method, identification system, and filter system of spam mail
CN104735048A (en) * 2014-12-02 2015-06-24 北京奇虎科技有限公司 Method and device for monitoring issued information in game
WO2016177148A1 (en) * 2015-08-18 2016-11-10 中兴通讯股份有限公司 Short message interception method and device
CN105426405A (en) * 2015-10-29 2016-03-23 维沃移动通信有限公司 Information processing method and mobile terminal
CN106941440A (en) * 2016-01-04 2017-07-11 五八同城信息技术有限公司 A kind of session anti-clutter method and device
CN106941440B (en) * 2016-01-04 2020-09-01 五八同城信息技术有限公司 Session anti-harassment method and device
CN111556109A (en) * 2020-04-17 2020-08-18 北京达佳互联信息技术有限公司 Request processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN101771966B (en) 2013-01-23

Similar Documents

Publication Publication Date Title
CN101771966B (en) Keywords and frequency based method for identifying spam message sources
CN101790142B (en) Method and system for identifying spam message sources by combining message contents and transmission frequency
WO2016197675A1 (en) Method and apparatus for identifying crank call
CN103607705B (en) Method for filtering spam short messages and engine
CN102802133B (en) Junk information identification method, device and system
CN102611805B (en) Communication information notifying method, information uploading method, server and communication terminal
CN102036263B (en) Spam message processing method, device and system
CN102572059B (en) Method and system for incoming call processing
CN107770777B (en) Method for identifying recorded fraud calls
CN101472247A (en) Method and system for controlling rubbish short message
CN102111723B (en) Method for identifying spam short message user by analyzing short message frequency and content
CN103888919A (en) Short message monitoring method and device thereof
CN103733581B (en) Message processing method and base station
CN102932753A (en) Method for intercepting spam multimedia message on link of multimedia system
CN105472586A (en) Spam message monitoring system and method
CN101321365B (en) Rubbish message sending user identification method by message reply frequency
CN102572746B (en) A kind of method sending behavioural characteristic identification junk short message source based on the frequency and user
CN101610473A (en) MMS content method for supervising and realize the device of this method
CN102905236A (en) Method, device and system for monitoring spam short messages
CN102231874A (en) Short message processing method, device and system
CN101572870A (en) Method for monitoring junk information in communication network
KR20150047378A (en) Device of blocking voice phishing calls
CN101692684A (en) Alarm message sending method of network video monitoring platform
CN102111767A (en) Method for improving correct rate of identifying junk short message number based on called dispersed degree
CN101656923A (en) Method and system for judging spam message

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130123

Termination date: 20180311