CN101771966B

CN101771966B - Keywords and frequency based method for identifying spam message sources

Info

Publication number: CN101771966B
Application number: CN 201010121687
Authority: CN
Inventors: 肖克华; 伍贳跟
Original assignee: LIANGJIANG COMMUNICATIONS SYSTEM CO Ltd
Current assignee: LIANGJIANG COMMUNICATIONS SYSTEM CO Ltd
Priority date: 2010-03-11
Filing date: 2010-03-11
Publication date: 2013-01-23
Anticipated expiration: 2030-03-11
Also published as: CN101771966A

Abstract

The invention discloses a keywords and frequency based method for identifying spam message sources. The method comprises the following steps: setting a keyword message frequency threshold which is the maximum threshold value violating the same keyword in a period, and blocking messages of a calling source in accordance to the keyword when the number of messages transmitted by the calling source violating the same keyword exceeds the maximum threshold value of the keyword message frequency threshold; and setting a blacklist threshold, namely setting the maximum threshold value which is the number of different keywords of the calling source exceeding the keyword message frequency threshold, determining the calling source to be a spam message source when the number of the calling source violating different keywords exceeds the maximum threshold value, and blocking all messages of the calling source. The method can effectively block spam messages by adopting a mode of combining keywords with frequency and listing sources violating a plurality of keywords simultaneously in the blacklist according to the sending characteristics of spam messages.

Description

A kind of method based on keyword and frequency identification refuse messages source

Technical field

The present invention relates to field of telecommunications, relate in particular to monitoring and the improvement of refuse messages, is a kind of method in conjunction with keyword and transmission frequency identification refuse messages source.

Background technology

SMS (Short Message Service) is as a kind of basic service of mobile communications network, and when convenient message communicating service was provided for the user, also the propagation for garbage information provided channel.And rubbish short message has the trend that grows in intensity, and refuse messages not only brings the harmful effect of customer complaint, also has the malicious owing fee problem, therefore need to carry out the Real Time Monitoring interception to refuse messages.

The transmission feature of refuse messages has: 1, send the frequency higher, 2, content repeats, and mostly is swindle or advertisement, meets key characteristics.

The means of current refuse messages Feed Discovery are mostly based on pure frequency statistics and keyword.Frequency value arranges excessive, can leak the refuse messages source of blocking; Frequency value arranges too small, causes elam error rate higher; It is simple that keyword arranges condition, easily causes mistake to block; Keyword is complicated, can not cover fully.

Summary of the invention

The object of the invention is to overcome the defective of prior art and a kind of method based on keyword and frequency identification refuse messages source is provided, the method is according to the transmission feature of refuse messages, be that normal users can not send the content of violating different keywords in a large number, the method that adopts analysis keyword note and the frequency, different keyword violation number of times to combine identifies junk short message user's method, limit this source and send note, the method is a kind of method of effective catching rubbish note.

The technical scheme that realizes above-mentioned purpose is: a kind of method based on keyword and frequency identification refuse messages source, wherein,

Set keyword message frequency thresholding, this keyword message frequency thresholding is the max-thresholds of violating same keyword in a period of time, when the note quantity of the same keyword of violation that sends when the caller source in the time range of setting exceeds the max-thresholds of this keyword message frequency thresholding, then this caller source note of meeting this keyword will be blocked;

Set the blacklist thresholding, namely the caller source exceeds the max-thresholds of the different keyword numbers of keyword message frequency thresholding in the time range of setting, when the number of the different keyword of caller source violation in the time range of setting exceeds this max-thresholds, judge that then this caller source is the refuse messages source, tackles all notes in this caller source.

The above-mentioned method based on keyword and frequency identification refuse messages source wherein, comprises the following steps:

Step S1. sets the keyword definition table, this keyword definition table comprise keyword with and corresponding keyword code;

Step S2. sets the threshold condition A of keyword message frequency thresholding, and namely to send the max-thresholds of the note of the same keyword of violation in time period P be M1 in the caller source, and wherein, P is positive number, and M1 is positive integer;

Step S3. sets the threshold condition B of blacklist thresholding, i.e. the max-thresholds of the different keyword numbers of caller source violation threshold condition A is M2, and wherein, M2 is positive integer;

Step S4. receives note;

Step S5. judges whether the short message content that receives violates keyword,

If do not violate keyword, then return step S4;

If the violation keyword then enters step S6;

Step S6. deposits the note formation in, be the calling number of violating the caller source of keyword note among the recording step S5, the transmitting time that this caller source sends this note and the corresponding keyword code of keyword of violating, these three data are deposited in the note formation;

Step S7. judges whether the count value of this keyword of this calling number has reached threshold condition A in time period P, be that whether the note formation that crucial word code is arranged in caller source has reached in the count value of the same keyword of time period P whether surpass max-thresholds M1 among the threshold condition A among the determining step S6

If do not surpass the max-thresholds M1 among the threshold condition A, then continue monitoring, return step S4;

If surpass the max-thresholds M1 among the threshold condition A, then enter step S8 and further judge;

Step S8. judges whether this caller source reaches threshold condition B, whether the caller source that reaches max-thresholds M1 among the determining step S7 has reached threshold condition B, be whether the number of times of the caller source different keywords that reach threshold condition A surpasses the max-thresholds M2 among the threshold condition B

If do not exceed max-thresholds M2, then this caller source does not pipe off, only record this caller source and

The corresponding relation of its keyword forwards step S9 to the note formation;

If exceed max-thresholds M2, then this caller source pipes off, and forwards step S10 to;

Step S9. tackles the note that meets keyword in this caller source, then returns step S4;

Step S10. pipes off, and namely this caller source is the refuse messages source, pipes off;

All notes in the caller source during step S11. interception pipes off.

The invention has the beneficial effects as follows: the present invention is by the method based on keyword and frequency identification refuse messages source, when the short message content of the transmission in caller source is violated the keyword threshold value that special key words and excess flow set, only tackle the user and violate the note of the keyword exceed the keyword threshold value, and the number of violating simultaneously different keywords as the user is when reaching keyword blacklist threshold value, judge that just the caller source is the refuse messages source, tackle all notes in this source, get rid of normal message sources note and violate once in a while the situation of keyword, reduce the intercepting rubbish short message elam error rate.Therefore the present invention can identify the note that meets keyword, can reduce elam error rate again, accurately identifies the refuse messages source.

Description of drawings

Fig. 1 is of the present invention by the flow chart based on an embodiment of the method in keyword and frequency identification refuse messages source;

Fig. 2 be one embodiment of the invention according to keyword definition represent the intention;

Fig. 3 is the schematic diagram of the note formation that crucial word code is arranged of one embodiment of the invention;

Fig. 4 is the schematic diagram according to the statistics formation of caller and keyword of one embodiment of the invention.

Embodiment

A kind of method based on keyword and frequency identification refuse messages source comprises:

Set the corresponding relation of keyword and keyword code;

Set keyword message frequency thresholding, this keyword message frequency thresholding is the max-thresholds of violating same keyword in a period of time;

Set the blacklist thresholding, namely the caller source exceeds the max-thresholds of the different keyword numbers of keyword message frequency thresholding in the time range of setting, and exceeds this max-thresholds, is put on the blacklist and tackles;

Receive note, judge whether to violate the keyword of setting, record caller source sends the note that content is violated keyword;

When the note quantity of the same keyword of violation that sends when the caller source in the time range of setting exceeds the max-thresholds of this keyword message frequency thresholding, then this caller source note of meeting this keyword will be blocked;

The different keyword numbers that tackle simultaneously in this caller source in the time range of setting exceed the max-thresholds in the blacklist thresholding, think that this caller source is the refuse messages source, tackle all notes in this caller source.

The invention will be further described below in conjunction with an embodiment.

See also Fig. 1, may further comprise the steps among this embodiment:

Step S1. sets the keyword definition table, sees also Fig. 2, this keyword definition table comprise keyword with and corresponding keyword code;

Step S2. sets the threshold condition A of keyword message frequency thresholding, be that the caller source sends the note of violating same keyword in time period P max-thresholds is M1, wherein, P is positive number, and M1 is positive integer, in the present embodiment, be provided with the keyword of keyword code 501/502/503 representative, P=1 days, the M1=5 bar, namely max-thresholds is that M1 is 5 same keyword note/skies;

Step S3. sets the threshold condition B of blacklist thresholding, i.e. the max-thresholds of the different keywords of caller source violation threshold condition A is M2, and wherein, M2 is positive integer (PLSCONFM herein), in the present embodiment, and M2=3, i.e. 3 keyword/skies;

Step S4. receives note;

Whether the short message content that receives among the step S5. determining step S4 violates keyword,

If do not violate keyword, then return step S4;

If the violation keyword then enters step S6;

Step S6. deposits the note formation in, be the calling number of violating the caller source of keyword note among the recording step S5, the transmitting time that this caller source sends this note and the corresponding keyword code of keyword of violating, these three data are deposited in the note formation, see also Fig. 3, show the note formation that this has crucial word code;

Step S8. judges whether this caller source reaches threshold condition B, whether the caller source that reaches max-thresholds M1 among the determining step S7 has reached threshold condition B, be whether the number of times of the caller source different keywords that reach threshold condition A surpasses the max-thresholds M2 among the threshold condition B, see also shown in Figure 4

Step S9. tackles the note that meets keyword in this caller source when not exceeding max-thresholds M2, then return step S4;

Step S10. pipes off, and as shown in Figure 4, having reached max-thresholds is M2 (3 keyword/sky), and namely this caller source is the refuse messages source, pipes off;

All notes in the caller source during step S11. interception pipes off.

In sum, the present invention is a kind of method based on keyword and frequency identification refuse messages source, and it does not reach the keyword of keyword note max-thresholds with keyword and frequency combined calculation for the source, do not carry out the pairing interception of source and keyword; For the source that reaches keyword note max-thresholds, carry out source and keyword pairing interception; Judge again whether this source reaches the blacklist max-thresholds, for the source that does not reach max-thresholds, only tackle its corresponding keyword; For the source that reaches the blacklist max-thresholds, tackle all notes in this source.Adopt the method accurately to identify the refuse messages source.

Adopt this method for the note optimization system, can reduce the elam error rate of refuse messages, improved the degree of hitting of refuse messages identification.

Below embodiment has been described in detail the present invention by reference to the accompanying drawings, and those skilled in the art can make the many variations example to the present invention according to the above description.Thereby some details among the embodiment should not consist of limitation of the invention, and the scope that the present invention will define with appended claims is as protection scope of the present invention.

Claims

1. the method based on keyword and frequency identification refuse messages source is characterized in that,

2. the method based on keyword and frequency identification refuse messages source according to claim 1 is characterized in that, comprises the following steps:

Step S4. receives note;

If do not violate keyword, then return step S4;

If the violation keyword then enters step S6;

Step S7. judges whether the count value of this keyword of this calling number has reached threshold condition A in time period P, be whether the note formation that crucial word code is arranged in caller source among the determining step S6 surpasses max-thresholds M1 among the threshold condition A to the count value of same keyword in time period P

Step S8. judges whether this caller source reaches threshold condition B, whether the caller source that reaches max-thresholds M1 among the determining step S7 has reached threshold condition B, be whether the number of the caller source different keywords that reach threshold condition A surpasses the max-thresholds M2 among the threshold condition B

All notes in the caller source during step S11. interception pipes off.