CN101771966B - Keywords and frequency based method for identifying spam message sources - Google Patents

Keywords and frequency based method for identifying spam message sources Download PDF

Info

Publication number
CN101771966B
CN101771966B CN 201010121687 CN201010121687A CN101771966B CN 101771966 B CN101771966 B CN 101771966B CN 201010121687 CN201010121687 CN 201010121687 CN 201010121687 A CN201010121687 A CN 201010121687A CN 101771966 B CN101771966 B CN 101771966B
Authority
CN
China
Prior art keywords
keyword
source
thresholds
max
note
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 201010121687
Other languages
Chinese (zh)
Other versions
CN101771966A (en
Inventor
肖克华
伍贳跟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LIANGJIANG COMMUNICATIONS SYSTEM CO Ltd
Original Assignee
LIANGJIANG COMMUNICATIONS SYSTEM CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by LIANGJIANG COMMUNICATIONS SYSTEM CO Ltd filed Critical LIANGJIANG COMMUNICATIONS SYSTEM CO Ltd
Priority to CN 201010121687 priority Critical patent/CN101771966B/en
Publication of CN101771966A publication Critical patent/CN101771966A/en
Application granted granted Critical
Publication of CN101771966B publication Critical patent/CN101771966B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a keywords and frequency based method for identifying spam message sources. The method comprises the following steps: setting a keyword message frequency threshold which is the maximum threshold value violating the same keyword in a period, and blocking messages of a calling source in accordance to the keyword when the number of messages transmitted by the calling source violating the same keyword exceeds the maximum threshold value of the keyword message frequency threshold; and setting a blacklist threshold, namely setting the maximum threshold value which is the number of different keywords of the calling source exceeding the keyword message frequency threshold, determining the calling source to be a spam message source when the number of the calling source violating different keywords exceeds the maximum threshold value, and blocking all messages of the calling source. The method can effectively block spam messages by adopting a mode of combining keywords with frequency and listing sources violating a plurality of keywords simultaneously in the blacklist according to the sending characteristics of spam messages.

Description

A kind of method based on keyword and frequency identification refuse messages source
Technical field
The present invention relates to field of telecommunications, relate in particular to monitoring and the improvement of refuse messages, is a kind of method in conjunction with keyword and transmission frequency identification refuse messages source.
Background technology
SMS (Short Message Service) is as a kind of basic service of mobile communications network, and when convenient message communicating service was provided for the user, also the propagation for garbage information provided channel.And rubbish short message has the trend that grows in intensity, and refuse messages not only brings the harmful effect of customer complaint, also has the malicious owing fee problem, therefore need to carry out the Real Time Monitoring interception to refuse messages.
The transmission feature of refuse messages has: 1, send the frequency higher, 2, content repeats, and mostly is swindle or advertisement, meets key characteristics.
The means of current refuse messages Feed Discovery are mostly based on pure frequency statistics and keyword.Frequency value arranges excessive, can leak the refuse messages source of blocking; Frequency value arranges too small, causes elam error rate higher; It is simple that keyword arranges condition, easily causes mistake to block; Keyword is complicated, can not cover fully.
Summary of the invention
The object of the invention is to overcome the defective of prior art and a kind of method based on keyword and frequency identification refuse messages source is provided, the method is according to the transmission feature of refuse messages, be that normal users can not send the content of violating different keywords in a large number, the method that adopts analysis keyword note and the frequency, different keyword violation number of times to combine identifies junk short message user's method, limit this source and send note, the method is a kind of method of effective catching rubbish note.
The technical scheme that realizes above-mentioned purpose is: a kind of method based on keyword and frequency identification refuse messages source, wherein,
Set keyword message frequency thresholding, this keyword message frequency thresholding is the max-thresholds of violating same keyword in a period of time, when the note quantity of the same keyword of violation that sends when the caller source in the time range of setting exceeds the max-thresholds of this keyword message frequency thresholding, then this caller source note of meeting this keyword will be blocked;
Set the blacklist thresholding, namely the caller source exceeds the max-thresholds of the different keyword numbers of keyword message frequency thresholding in the time range of setting, when the number of the different keyword of caller source violation in the time range of setting exceeds this max-thresholds, judge that then this caller source is the refuse messages source, tackles all notes in this caller source.
The above-mentioned method based on keyword and frequency identification refuse messages source wherein, comprises the following steps:
Step S1. sets the keyword definition table, this keyword definition table comprise keyword with and corresponding keyword code;
Step S2. sets the threshold condition A of keyword message frequency thresholding, and namely to send the max-thresholds of the note of the same keyword of violation in time period P be M1 in the caller source, and wherein, P is positive number, and M1 is positive integer;
Step S3. sets the threshold condition B of blacklist thresholding, i.e. the max-thresholds of the different keyword numbers of caller source violation threshold condition A is M2, and wherein, M2 is positive integer;
Step S4. receives note;
Step S5. judges whether the short message content that receives violates keyword,
If do not violate keyword, then return step S4;
If the violation keyword then enters step S6;
Step S6. deposits the note formation in, be the calling number of violating the caller source of keyword note among the recording step S5, the transmitting time that this caller source sends this note and the corresponding keyword code of keyword of violating, these three data are deposited in the note formation;
Step S7. judges whether the count value of this keyword of this calling number has reached threshold condition A in time period P, be that whether the note formation that crucial word code is arranged in caller source has reached in the count value of the same keyword of time period P whether surpass max-thresholds M1 among the threshold condition A among the determining step S6
If do not surpass the max-thresholds M1 among the threshold condition A, then continue monitoring, return step S4;
If surpass the max-thresholds M1 among the threshold condition A, then enter step S8 and further judge;
Step S8. judges whether this caller source reaches threshold condition B, whether the caller source that reaches max-thresholds M1 among the determining step S7 has reached threshold condition B, be whether the number of times of the caller source different keywords that reach threshold condition A surpasses the max-thresholds M2 among the threshold condition B
If do not exceed max-thresholds M2, then this caller source does not pipe off, only record this caller source and
The corresponding relation of its keyword forwards step S9 to the note formation;
If exceed max-thresholds M2, then this caller source pipes off, and forwards step S10 to;
Step S9. tackles the note that meets keyword in this caller source, then returns step S4;
Step S10. pipes off, and namely this caller source is the refuse messages source, pipes off;
All notes in the caller source during step S11. interception pipes off.
The invention has the beneficial effects as follows: the present invention is by the method based on keyword and frequency identification refuse messages source, when the short message content of the transmission in caller source is violated the keyword threshold value that special key words and excess flow set, only tackle the user and violate the note of the keyword exceed the keyword threshold value, and the number of violating simultaneously different keywords as the user is when reaching keyword blacklist threshold value, judge that just the caller source is the refuse messages source, tackle all notes in this source, get rid of normal message sources note and violate once in a while the situation of keyword, reduce the intercepting rubbish short message elam error rate.Therefore the present invention can identify the note that meets keyword, can reduce elam error rate again, accurately identifies the refuse messages source.
Description of drawings
Fig. 1 is of the present invention by the flow chart based on an embodiment of the method in keyword and frequency identification refuse messages source;
Fig. 2 be one embodiment of the invention according to keyword definition represent the intention;
Fig. 3 is the schematic diagram of the note formation that crucial word code is arranged of one embodiment of the invention;
Fig. 4 is the schematic diagram according to the statistics formation of caller and keyword of one embodiment of the invention.
Embodiment
A kind of method based on keyword and frequency identification refuse messages source comprises:
Set the corresponding relation of keyword and keyword code;
Set keyword message frequency thresholding, this keyword message frequency thresholding is the max-thresholds of violating same keyword in a period of time;
Set the blacklist thresholding, namely the caller source exceeds the max-thresholds of the different keyword numbers of keyword message frequency thresholding in the time range of setting, and exceeds this max-thresholds, is put on the blacklist and tackles;
Receive note, judge whether to violate the keyword of setting, record caller source sends the note that content is violated keyword;
When the note quantity of the same keyword of violation that sends when the caller source in the time range of setting exceeds the max-thresholds of this keyword message frequency thresholding, then this caller source note of meeting this keyword will be blocked;
The different keyword numbers that tackle simultaneously in this caller source in the time range of setting exceed the max-thresholds in the blacklist thresholding, think that this caller source is the refuse messages source, tackle all notes in this caller source.
The invention will be further described below in conjunction with an embodiment.
See also Fig. 1, may further comprise the steps among this embodiment:
Step S1. sets the keyword definition table, sees also Fig. 2, this keyword definition table comprise keyword with and corresponding keyword code;
Step S2. sets the threshold condition A of keyword message frequency thresholding, be that the caller source sends the note of violating same keyword in time period P max-thresholds is M1, wherein, P is positive number, and M1 is positive integer, in the present embodiment, be provided with the keyword of keyword code 501/502/503 representative, P=1 days, the M1=5 bar, namely max-thresholds is that M1 is 5 same keyword note/skies;
Step S3. sets the threshold condition B of blacklist thresholding, i.e. the max-thresholds of the different keywords of caller source violation threshold condition A is M2, and wherein, M2 is positive integer (PLSCONFM herein), in the present embodiment, and M2=3, i.e. 3 keyword/skies;
Step S4. receives note;
Whether the short message content that receives among the step S5. determining step S4 violates keyword,
If do not violate keyword, then return step S4;
If the violation keyword then enters step S6;
Step S6. deposits the note formation in, be the calling number of violating the caller source of keyword note among the recording step S5, the transmitting time that this caller source sends this note and the corresponding keyword code of keyword of violating, these three data are deposited in the note formation, see also Fig. 3, show the note formation that this has crucial word code;
Step S7. judges whether the count value of this keyword of this calling number has reached threshold condition A in time period P, be that whether the note formation that crucial word code is arranged in caller source has reached in the count value of the same keyword of time period P whether surpass max-thresholds M1 among the threshold condition A among the determining step S6
If do not surpass the max-thresholds M1 among the threshold condition A, then continue monitoring, return step S4;
If surpass the max-thresholds M1 among the threshold condition A, then enter step S8 and further judge;
Step S8. judges whether this caller source reaches threshold condition B, whether the caller source that reaches max-thresholds M1 among the determining step S7 has reached threshold condition B, be whether the number of times of the caller source different keywords that reach threshold condition A surpasses the max-thresholds M2 among the threshold condition B, see also shown in Figure 4
If do not exceed max-thresholds M2, then this caller source does not pipe off, only record this caller source and
The corresponding relation of its keyword forwards step S9 to the note formation;
If exceed max-thresholds M2, then this caller source pipes off, and forwards step S10 to;
Step S9. tackles the note that meets keyword in this caller source when not exceeding max-thresholds M2, then return step S4;
Step S10. pipes off, and as shown in Figure 4, having reached max-thresholds is M2 (3 keyword/sky), and namely this caller source is the refuse messages source, pipes off;
All notes in the caller source during step S11. interception pipes off.
In sum, the present invention is a kind of method based on keyword and frequency identification refuse messages source, and it does not reach the keyword of keyword note max-thresholds with keyword and frequency combined calculation for the source, do not carry out the pairing interception of source and keyword; For the source that reaches keyword note max-thresholds, carry out source and keyword pairing interception; Judge again whether this source reaches the blacklist max-thresholds, for the source that does not reach max-thresholds, only tackle its corresponding keyword; For the source that reaches the blacklist max-thresholds, tackle all notes in this source.Adopt the method accurately to identify the refuse messages source.
Adopt this method for the note optimization system, can reduce the elam error rate of refuse messages, improved the degree of hitting of refuse messages identification.
Below embodiment has been described in detail the present invention by reference to the accompanying drawings, and those skilled in the art can make the many variations example to the present invention according to the above description.Thereby some details among the embodiment should not consist of limitation of the invention, and the scope that the present invention will define with appended claims is as protection scope of the present invention.

Claims (2)

1. the method based on keyword and frequency identification refuse messages source is characterized in that,
Set keyword message frequency thresholding, this keyword message frequency thresholding is the max-thresholds of violating same keyword in a period of time, when the note quantity of the same keyword of violation that sends when the caller source in the time range of setting exceeds the max-thresholds of this keyword message frequency thresholding, then this caller source note of meeting this keyword will be blocked;
Set the blacklist thresholding, namely the caller source exceeds the max-thresholds of the different keyword numbers of keyword message frequency thresholding in the time range of setting, when the number of the different keyword of caller source violation in the time range of setting exceeds this max-thresholds, judge that then this caller source is the refuse messages source, tackles all notes in this caller source.
2. the method based on keyword and frequency identification refuse messages source according to claim 1 is characterized in that, comprises the following steps:
Step S1. sets the keyword definition table, this keyword definition table comprise keyword with and corresponding keyword code;
Step S2. sets the threshold condition A of keyword message frequency thresholding, and namely to send the max-thresholds of the note of the same keyword of violation in time period P be M1 in the caller source, and wherein, P is positive number, and M1 is positive integer;
Step S3. sets the threshold condition B of blacklist thresholding, i.e. the max-thresholds of the different keyword numbers of caller source violation threshold condition A is M2, and wherein, M2 is positive integer;
Step S4. receives note;
Step S5. judges whether the short message content that receives violates keyword,
If do not violate keyword, then return step S4;
If the violation keyword then enters step S6;
Step S6. deposits the note formation in, be the calling number of violating the caller source of keyword note among the recording step S5, the transmitting time that this caller source sends this note and the corresponding keyword code of keyword of violating, these three data are deposited in the note formation;
Step S7. judges whether the count value of this keyword of this calling number has reached threshold condition A in time period P, be whether the note formation that crucial word code is arranged in caller source among the determining step S6 surpasses max-thresholds M1 among the threshold condition A to the count value of same keyword in time period P
If do not surpass the max-thresholds M1 among the threshold condition A, then continue monitoring, return step S4;
If surpass the max-thresholds M1 among the threshold condition A, then enter step S8 and further judge;
Step S8. judges whether this caller source reaches threshold condition B, whether the caller source that reaches max-thresholds M1 among the determining step S7 has reached threshold condition B, be whether the number of the caller source different keywords that reach threshold condition A surpasses the max-thresholds M2 among the threshold condition B
If do not exceed max-thresholds M2, then this caller source does not pipe off, only record this caller source and
The corresponding relation of its keyword forwards step S9 to the note formation;
If exceed max-thresholds M2, then this caller source pipes off, and forwards step S10 to;
Step S9. tackles the note that meets keyword in this caller source, then returns step S4;
Step S10. pipes off, and namely this caller source is the refuse messages source, pipes off;
All notes in the caller source during step S11. interception pipes off.
CN 201010121687 2010-03-11 2010-03-11 Keywords and frequency based method for identifying spam message sources Expired - Fee Related CN101771966B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010121687 CN101771966B (en) 2010-03-11 2010-03-11 Keywords and frequency based method for identifying spam message sources

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010121687 CN101771966B (en) 2010-03-11 2010-03-11 Keywords and frequency based method for identifying spam message sources

Publications (2)

Publication Number Publication Date
CN101771966A CN101771966A (en) 2010-07-07
CN101771966B true CN101771966B (en) 2013-01-23

Family

ID=42504494

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010121687 Expired - Fee Related CN101771966B (en) 2010-03-11 2010-03-11 Keywords and frequency based method for identifying spam message sources

Country Status (1)

Country Link
CN (1) CN101771966B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102480702A (en) * 2010-11-24 2012-05-30 腾讯科技(深圳)有限公司 Short message intercepting method and system
CN102843343B (en) * 2011-06-23 2016-04-20 腾讯科技(深圳)有限公司 A kind of system, frequency control apparatus and service server controlling junk information in a network
CN102890688B (en) * 2011-07-22 2018-01-02 深圳市世纪光速信息技术有限公司 A kind of automatic detection method and device for submitting content
CN103188635B (en) * 2011-12-29 2017-08-08 上海粱江通信系统股份有限公司 A kind of method that junk short message source is recognized based on the frequency and called distribution rule
CN103188682B (en) * 2011-12-30 2016-05-25 中国移动通信集团吉林有限公司 A kind of method and device of controlling the communicating number that sends rubbish message
CN103812826A (en) * 2012-11-08 2014-05-21 中国电信股份有限公司 Identification method, identification system, and filter system of spam mail
CN104735048B (en) * 2014-12-02 2019-02-12 北京奇虎科技有限公司 The monitoring method and device to release news in a kind of game
CN106470405A (en) * 2015-08-18 2017-03-01 中兴通讯股份有限公司 SMS interception method and device
CN105426405B (en) * 2015-10-29 2019-05-17 维沃移动通信有限公司 Information processing method and mobile terminal
CN106941440B (en) * 2016-01-04 2020-09-01 五八同城信息技术有限公司 Session anti-harassment method and device
CN111556109B (en) * 2020-04-17 2021-05-18 北京达佳互联信息技术有限公司 Request processing method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1905408A (en) * 2006-08-04 2007-01-31 华为技术有限公司 Method and apparatus for monitoring message
CN101335920A (en) * 2008-07-15 2008-12-31 中国联合通信有限公司 Rubbish short message recognition system and method based on calling number location and transmitted content
CN101616397A (en) * 2009-07-20 2009-12-30 中兴通讯股份有限公司 A kind of supervisory control system of inter-provincial roaming message and implementation method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060019684A1 (en) * 2004-07-22 2006-01-26 Xiao-Qin Yu Short message filter mechanism and communication device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1905408A (en) * 2006-08-04 2007-01-31 华为技术有限公司 Method and apparatus for monitoring message
CN101335920A (en) * 2008-07-15 2008-12-31 中国联合通信有限公司 Rubbish short message recognition system and method based on calling number location and transmitted content
CN101616397A (en) * 2009-07-20 2009-12-30 中兴通讯股份有限公司 A kind of supervisory control system of inter-provincial roaming message and implementation method

Also Published As

Publication number Publication date
CN101771966A (en) 2010-07-07

Similar Documents

Publication Publication Date Title
CN101771966B (en) Keywords and frequency based method for identifying spam message sources
CN101790142B (en) Method and system for identifying spam message sources by combining message contents and transmission frequency
CN101262648A (en) A method and system for processing spam
CN103607705B (en) Method for filtering spam short messages and engine
CN102611805B (en) Communication information notifying method, information uploading method, server and communication terminal
CN102036263B (en) Spam message processing method, device and system
CN102572059B (en) Method and system for incoming call processing
CN102802133A (en) Junk information identification method, device and system
CN101262675A (en) Method for mobile phone to prevent from spam
CN101472247A (en) Method and system for controlling rubbish short message
CN101909261A (en) Method and system for monitoring spam
CN101873618A (en) Communication monitoring method and device
CN101415188B (en) Supervision method for sending rubbish mass message
CN103369486A (en) System and method for preventing fraud SMS (Short message Service) message
CN102111723B (en) Method for identifying spam short message user by analyzing short message frequency and content
CN103888919A (en) Short message monitoring method and device thereof
CN102111731A (en) Method based on content similarity for improving recognition accuracy of spam message numbers
CN102932753A (en) Method for intercepting spam multimedia message on link of multimedia system
CN101321365B (en) Rubbish message sending user identification method by message reply frequency
CN102572746B (en) A kind of method sending behavioural characteristic identification junk short message source based on the frequency and user
CN101453707A (en) Method for monitoring rubbish information in communication network
CN102905236A (en) Method, device and system for monitoring spam short messages
CN103686736A (en) Garbage message interception method and platform
CN101572870A (en) Method for monitoring junk information in communication network
CN102231874A (en) Short message processing method, device and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130123

Termination date: 20180311

CF01 Termination of patent right due to non-payment of annual fee