CN101389085B - Rubbish short message recognition system and method based on sending behavior - Google Patents

Rubbish short message recognition system and method based on sending behavior Download PDF

Info

Publication number
CN101389085B
CN101389085B CN2008102242531A CN200810224253A CN101389085B CN 101389085 B CN101389085 B CN 101389085B CN 2008102242531 A CN2008102242531 A CN 2008102242531A CN 200810224253 A CN200810224253 A CN 200810224253A CN 101389085 B CN101389085 B CN 101389085B
Authority
CN
China
Prior art keywords
short message
content
calling number
hashed value
junk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2008102242531A
Other languages
Chinese (zh)
Other versions
CN101389085A (en
Inventor
张尼
张智江
张范
顾旻霞
贾川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN2008102242531A priority Critical patent/CN101389085B/en
Publication of CN101389085A publication Critical patent/CN101389085A/en
Application granted granted Critical
Publication of CN101389085B publication Critical patent/CN101389085B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention relates to a spam short message identification system and a method for the same based on the sending behavior. The method comprises: judging the type of the doubtful short message according to the hashed value, the length of the short message, and the amount of the short message having the same hashed value; recording the calling number that sends the short message according to the short message that belongs to the doubtful one, or, undergoing treatment respectively according to whether the calling number appears for the first time; if the amount of the short message of the type of the doubtful short message is accorded with the preset value, acquiring all the calling numbers associated with the type of the doubtful short message; if the difference between the amount of the different hashed value associated with all the calling numbers and the amount of the calling number is not greater than the preset value, the doubtful short message is spam short message. The invention can ensure that the spam short message received by the short message centre can be identified efficiently with real time, and implement real-time interception for the spam short message.

Description

Rubbish short message recognition system and method based on the behavior of transmission
Technical field
The present invention relates to the junk short message field, relate in particular to rubbish short message recognition system and method based on the behavior of transmission.
Background technology
In recent years, the situation that junk short message spreads unchecked grows in intensity, and almost each cellphone subscriber has been the invasion of junk short message.Investigation result according to the issue of China Internet association shows that China cellphone subscriber on average receives 8.29 junk short message weekly.
Junk short message can be divided into two kinds of patterns according to the mode of its transmission; A kind of is to utilize the short messaging gateway of mobile operator to send; When the user received short message, the transmission number of its demonstration was the Number for access of short message, rather than the phone number of domestic consumer.The junk short message that this kind method is sent have speed fast, simple to operate, need characteristics such as operator's permission, short message types is in the majority with Commdity advertisement, service class.
Another kind is that the mobile phone card is inserted the mass-sending device, is connected on the computer by serial through serial port connecting wire, utilizes the mass-sending software on the computer to send (sending for the mass-sending device hereinafter to be referred as this mode) then.The mobile phone card (like M-ZONE, walk in the Divine Land etc.) that user or purchase need not to register is in a large number overdrawed, or just catches the weakness of preferential set meal to come mad short-message sending.This type mass-sending device can connect 16-20 simultaneously can mass-send up to ten thousand short message in a short period of time with port, so operator often has little time to charge just by the malicious overdraft telephone expenses.When the user received short message, the transmission number of its demonstration was common phone number.The junk short message that the method is sent has that number is many, speed soon, does not need characteristics such as operator's permission.In addition, during mass-sending, the junk short message flow is huge, certainly will take more Radio Resource, and for guaranteeing throughput, the junk short message sender can select a plurality of parallel transmissions of transmission point that are positioned at different base station usually.
Along with public's medium and the public opinion attention rate to junk short message constantly improves; Mobile operator has strengthened utilizing short messaging gateway to send the renovation dynamics of junk short message; Implemented some simple, effective measures,, in content of short message, added company's actual signature as strengthening contents supervision to the short message transmit port; Improve and to utilize the port rate that send SMS message, close etc. complaining bigger port.
After above-mentioned measure was implemented, the phenomenon of sending junk short message through short messaging gateway obviously reduced.But the phenomenon for the lawless person utilizes the mass-sending device to send junk short message does not still have effective governing measure at present.
To the means that the lawless person utilizes the mass-sending device to send junk short message, the present implementation method of operator mainly contains following four kinds of mechanism: content recognition, black and white lists, traffic statistics, ticket analysis:
1, content recognition mechanism
Content recognition technology as rule-based recognition methods relatively commonly used promptly are provided with some rules, as getting the winning number in a bond, make a good deal of money etc., as long as meet these regular one or several, just think junk short message.Adopt rule-based recognition technology, advantage is that its principle and implementation are all comparatively simple, and application cost is lower.But its weak point is 1) rule all is artificial appointment, needs people constantly to go to find and sums up, upgrades, and pays bigger maintenance costs.2) to choose difficulty big for rule, only is difficult to judge the content legality property of short message through the keyword coupling, therefore is easy to cause erroneous judgement; 3) the rubbish sender is easy to through using methods such as phonetic or homophone to walk around list of rules.
In addition, people have also adopted hashing technique, bayesian algorithm, support vector machine method etc.These methods can learning word frequency and pattern, can associate with normal short message with junk short message like this and judge.Above-mentioned is a kind of with respect to keyword; More complicated and more intelligentized content recognition technology; But its shortcoming is also comparatively obvious, and for example speed is slow, needs the user to bring in constant renewal in rule base or training set; Along with junk short message producer's technical merit improves constantly, this method is easy to lose efficacy etc.
2, black and white lists mechanism
The black and white lists technology is to discern junk short message according to the phone number of transmit leg.To the short message that the number in the white list sends, system will not carry out any processing, but directly let pass; And any short message that sends for the number in the blacklist, system all will tackle, and forbids that it is descending.This method is simple, and is very little to the influence of original system, need not transform original short message service center basically.This method needs real-time update white list and blacklist, and recognition capability is limited.
3, short message traffic statistics mechanism
Short message flow to some mobile phones or some SP unit interval send or receive is added up, in case find that this statistical value surpasses some threshold values and just reports to the police.If detect the individual or the service provider of a large amount of transmission junk short message, immediately it supervised.First method is to detect the short message number that sends in the unit interval.Detect the unit interval user number that sends SMS message and need dispose a counter for each user, short message counter of every transmission adds one automatically.The bar number reaches defined amount if send SMS message, and is then reported to the police automatically by counter.Second method is two short messages of detection blanking time.Promptly the transmission frequency of short message is monitored, too short at interval when two short messages, represent that promptly this user sends SMS message frequently, then report to the police automatically.
4, ticket analysis mechanisms
This mechanism is as the statistics source with the original bill files on the accounting server; Adding up each number rising in the certain hour section exhales the information bar number and sends success rate; Think that then this number is a suspicious user when surpassing certain thresholding, and submit to operating personnel and judge whether need this number is added blacklist.This machine-processed weak point is to have adopted the processed offline mode; Send to bill record collection life period poor (above 15 minutes) from short message; And the lawless person can utilize this time difference, through the mode of massive duplication SIM, sends ten hundreds of junk short message.
There is following shortcoming in above-mentioned recognition technology: 1) content recognition mechanism need be paid bigger maintenance costs; Need the user to bring in constant renewal in rule base or training set; Can not find New-type refuse vehicle short message characteristic, be difficult to discern the part variation of junk short message, and invade privacy of user; 2) black and white lists technology recognition capability is limited; 3) short message traffic statistics technology and ticket analytical technology real-time are relatively poor.
Summary of the invention
In order to solve above-mentioned technical problem; The invention provides rubbish short message recognition system and method based on the behavior of transmission, its purpose is that the sender who distinguishes short message is mass-sending device or normal users; Satisfy the requirement of real-time, accuracy, and do not invade privacy of user.
The invention provides junk short message recognition methods, comprising based on the behavior of transmission:
Step 1 is calculated the hashed value of the short message receive, and the quantity of the short message with identical content that has sent according to this hashed value record and the length of content of short message;
Step 2; If the quantity of the short message with identical content that has sent reaches first threshold; And the length of content of short message then writes down the calling number of this short message greater than second threshold value, and writes down the corresponding relation of the hashed value of this calling number and this short message; Otherwise, if the calling number of short message occurs for the first time, then do not do any operation, if this calling number occurs the non-first time, then write down the corresponding relation of the hashed value of this calling number and this short message;
Step 3; If the length with identical content and content of short message of having sent reaches the 3rd threshold value greater than the quantity of the short message of second threshold value, then obtain this length greater than all corresponding calling numbers of the short message of second threshold value with identical content and content of short message; If the difference of the quantity of the different hashed values that these all calling numbers are corresponding and the quantity of these all calling numbers is not more than the 4th threshold value, judge that then this short message with identical content is a junk short message.
In the step 1, directory and contents table are set also;
Directory is used to write down the calling number of short message, and the corresponding hashed value set of short message of sending of this calling number;
Contents table is used to write down the hashed value of short message, and the length of content of short message has the quantity of the short message of identical content, sends all calling numbers set of the short message with identical content;
The quantity of the hashed value of the short message that receives, the short message with identical content that sent and the length records of content of short message are in contents table.
In the step 1, the hashed value of calculating the short message that receives comprises:
If the position of first Chinese character of short message that the length of the content of short message that receives greater than second threshold value, is then confirmed to receive and the position of last Chinese character, and according to calculating hashed value to a last content that Chinese character comprised from first Chinese character; Perhaps
If the length of the content of short message that receives is less than or equal to second threshold value, then directly the content of the short message that receives is calculated hashed value.
Step 2 comprises:
Step 41 judges whether to meet the following conditions: the quantity of the short message with identical content that has sent reaches first threshold, and the length of content of short message is greater than second threshold value; If, execution in step 42, otherwise execution in step 43;
Step 42 writes directory with the calling number of the short message that receives: if this calling number occurs for the first time, then in directory, preserve this calling number hashed value corresponding with this short message, and this calling number is recorded in the contents table; If this number exists; Whether the corresponding hashed value of this calling number is identical in the hashed value that then relatively this short message is corresponding and the directory: if difference; Then that this short message is corresponding hashed value is recorded in the directory; And with this number record in contents table, if identical, then do not do any operation;
Step 43 if the calling number of the short message that receives occurs for the first time, is not then done any operation; If this calling number occurs the non-first time; Whether the corresponding hashed value of this calling number is identical in the corresponding hashed value of the short message that then relatively receives and the directory: if difference; Then that this short message is corresponding hashed value is recorded in the directory, if identical, does not then do any operation.
Contents table also is used to write down the short message generic attribute, and the short message generic attribute comprises suspicious short message class, junk short message class, and normal short message class.
In the step 2; If the quantity of the short message with identical content that has sent reaches first threshold; And the length of content of short message is greater than second threshold value, and then this corresponding short message generic attribute of short message with identical content is labeled as suspicious short message class in contents table.
In the step 3, also that the hashed value of junk short message is corresponding short message generic attribute is labeled as the junk short message class.
Also comprise step 4, the calling number of junk short message is sent to short message service center, be used to supply short message service center to filter junk short message.
Also comprise step 5, the normal short message class in the table that regularly clears contents.
The invention provides rubbish short message recognition system, comprising based on the behavior of transmission:
Send content processing module, be used to calculate the hashed value of the short message that receives, and write down the quantity of the short message that has sent and the length of content of short message with identical content according to this hashed value; If the quantity of the short message with identical content that has sent reaches first threshold, and the length of content of short message then writes down the calling number of this short message greater than second threshold value, and writes down the corresponding relation of the hashed value of this calling number and this short message; Otherwise, if the calling number of short message occurs for the first time, then do not do any operation, if this calling number occurs the non-first time, then write down the corresponding relation of the hashed value of this calling number and this short message;
The send mode statistical module; Be used for when the length with identical content and content of short message of having sent reaches the 3rd threshold value greater than the quantity of the short message of second threshold value, then obtain this length greater than all corresponding calling numbers of the short message of second threshold value with identical content and content of short message; If the difference of the quantity of the different hashed values that these all calling numbers are corresponding and the quantity of these all calling numbers is not more than the 4th threshold value, judge that then this short message with identical content is a junk short message.
Send content processing module, also be used to be provided with directory and contents table;
Directory is used to write down the calling number of short message, and the corresponding hashed value set of short message of sending of this calling number;
Contents table is used to write down the hashed value of short message, and the length of content of short message has the quantity of the short message of identical content, sends all calling numbers set of the short message with identical content;
The quantity of the hashed value of the short message that receives, the short message with identical content that sent and the length records of content of short message are in contents table.
The hashed value of the short message that calculating receives comprises:
If the position of first Chinese character of short message that the length of the content of short message that receives greater than second threshold value, is then confirmed to receive and the position of last Chinese character, and according to calculating hashed value to a last content that Chinese character comprised from first Chinese character; Perhaps
If the length of the content of short message that receives is less than or equal to second threshold value, then directly the content of the short message that receives is calculated hashed value.
Send content processing module, also be used to judge whether to meet the following conditions: the quantity of the short message with identical content that has sent reaches first threshold, and the length of content of short message is greater than second threshold value;
If: the calling number of the short message that receives is write directory, if this calling number occurs for the first time, then in directory, preserve this calling number hashed value corresponding, and this calling number is recorded in the contents table with this short message; If this calling number exists; Whether the corresponding hashed value of this calling number is identical in the hashed value that then relatively this short message is corresponding and the directory: if difference; Then that this short message is corresponding hashed value is recorded in the directory; And this calling number is recorded in the contents table, if identical, then do not do any operation;
Otherwise: if the calling number of the short message that receives occurs for the first time, then do not do any operation; If this calling number occurs the non-first time; Whether the corresponding hashed value of this calling number is identical in the corresponding hashed value of the short message that then relatively receives and the directory: if difference; Then that this short message is corresponding hashed value is recorded in the directory, if identical, does not then do any operation.
Contents table also is used to write down the short message generic attribute, and the short message generic attribute comprises suspicious short message class, junk short message class, and normal short message class.
Send content processing module; Also be used for reaching first threshold in the quantity of the short message that has sent with identical content; And the length of content of short message is labeled as suspicious short message class with this corresponding short message generic attribute of short message with identical content during greater than second threshold value in contents table.
The send mode statistical module also is used for being labeled as the junk short message class at the contents table short message generic attribute that the hashed value of junk short message is corresponding.
Also comprise the calling number sending module, be used for the calling number of this junk short message is sent to short message service center, filter junk short message for short message service center.
Also comprise administration module, be used for the normal short message class that regularly clears contents and show.
The present invention also provides a kind of GSM, comprises short message service center, based on the rubbish short message recognition system of the behavior of transmission;
Described based on the rubbish short message recognition system bypass of sending behavior in short message service center, perhaps be arranged in the short message service center.
The present invention can guarantee that the junk short message that short message service center receives is discerned real-time and efficiently, and can realize the real-time blocking to junk short message.
Description of drawings
Fig. 1 is a rubbish short message recognition system structure chart provided by the invention;
Fig. 2 is a data structure provided by the invention;
Fig. 3 is junk short message identification process figure provided by the invention;
Fig. 4 is a kind of network structure provided by the invention.
Embodiment
Short message communication between normal users has randomness, characteristics such as independence.Be embodied in,
1) content of short message of same calling number generation has randomness.Each communication, these calling numbers can produce the short message of different length, different content.
2) content of short message and the length of different calling number generations generally have nothing in common with each other.
For realizing high-throughput, the short message that uses the mass-sending device to send has content usually and length is fixed, and each recipient is only sent inferior characteristics.Be embodied in:
The rubbish sender often uses a plurality of calling numbers, and these numbers only produce the short message of regular length or content.
In the present invention, the transmission behavior is analyzed further to be divided into send content analysis and send mode analysis two parts.
1) junk short message that in a period of time, sends of same rubbish sender has convention, can form the structure of height cluster; The object that is directed against when carrying out statistical analysis after the cluster can no longer be the short message of magnanimity; But the less short message class of scale is labeled as suspicious short message class with duplicate short message quantity above setting first threshold and the content of short message length short message class greater than second threshold value.
The corresponding calling number send mode of short message of 2) follow-up branch being gone in the suspicious short message class is added up; If the short message total quantity of divide in the suspicious short message class surpasses the 3rd threshold value, and the hashed value quantity that all corresponding calling numbers of this short message class produce is not more than the 4th threshold value with the difference of calling number quantity, and then class is the junk short message class under the short message.
Rubbish short message recognition system among the present invention mainly is divided into four parts, and is as shown in Figure 1: send content processing module 101, send mode statistical module 102, calling number sending module 103, administration module 104.Send content processing module 101 and use effective hashing algorithm; Original short message is converted into the hashed value that is easy to computing and storage; And write in the contents table; And utilize the hashed value comparative result that the short message flow is classified, first threshold that duplicate short message quantity surpass to be set and content of short message length are suspicious short message class greater than the short message class of second threshold value; The calling number transmission behavior of 102 pairs of suspicious type of follow-up short messages of send mode statistical module is added up, with the identification junk short message.Calling number sending module 103 extracts the corresponding calling number of junk short message class, and calling number is sent to short message service center, is used to supply short message service center to filter junk short message.Calling number sending module 103 and the dispensable for the present invention module of administration module 104 send content processing module 101 and just can realize the identification to junk short message with send mode statistical module 102.
Handle in the short message process, the present invention needs frequently a large amount of calling numbers and the hashed value information of storing in the internal memory to be retrieved, compared, and constantly eliminates the calling number and the hashed value of normal short message class.For supporting aforesaid operations, the invention provides a sets of data structure, it is made up of contents table and directory two parts, and is as shown in Figure 2.
(1) contents table C is responsible for preservation, retrieval and the organization work of short message hashed value with hash table form tissue.Corresponding short message class in each unit in the table comprises following field: duplicate short message quantity V, short message generic attribute VI (rubbish, suspicious, normal three kinds of values are arranged) in the hashed value III that content of short message is corresponding, content of short message length IV, this short message class, send all calling numbers set VII of this short message.
(2) directory N is responsible for the statistical work of short message send mode with hash table form tissue.In the table each unit corresponding two parts: the 1) descriptor of calling number comprises caller code value I; 2) the short message hashed value set II of calling number generation.
From the above, directory is related each other with contents table.Can obtain the corresponding dialing number information of all short messages in this short message class through contents table; Can obtain all content of short message information that this number sends through directory.
A kind of categorical data is arranged in the directory: the calling number of short message, only work as certain short message and fall into suspicious short message time-like, extract its calling number and write directory.
A kind of data type is arranged: short message is carried out hash operations, and the length of hashed value and short message is added contents table in the contents table.
Send content processing module 101 real time scan short message flows, the short message that content is repeated gathers into one type, and number of short in the statistics class, sets first threshold f if number of short surpasses 0, and the length of content of short message is greater than second threshold value, and then this short message is suspicious short message class.
Distinguishing the common way of discerning junk short message according to content repeatability is that content of short message is carried out hash operations, generates a hashed value, accomplishes operations such as comparison, computing with this value then.Compare with black and white lists mechanism with keyword recognition, the method is sent content with a plurality of calling numbers and is carried out association analysis, has effect preferably aspect discrimination and the real-time performance.
In the face of repeating identification problem, describes content of short message down: regard the body part (being designated hereinafter simply as short message) of an envelope short message as byte sequence M=b that length is x 1b 2B x, the length of M is designated as length (M).As an aspect of research short message cluster character, what be concerned about is given k envelope short message, and whether its content exists repetition.
Therefore, a kind of feasible method be successively relatively in the short message each byte sequence whether identical, be to improve relative efficiency, use data structure T to preserve the content of short message of visiting.Run into the new short message of an envelope, at first with T in element relatively, if not therein, then it is added among the T, otherwise abandons this input, and the quantity of the identical short message of recorded content.Obviously, for guaranteeing to accomplish operation and minimizing memory costs such as retrieval, comparison, statistics fast, guarantee the availability of algorithm, it is the most natural method that T is organized into a hash table.
Ashing technique has two kinds usually, and a kind of is that whole content of short message is done hash, an envelope content of short message corresponding a hashed value, this method is effective to the short hash object of length; Another kind method is that several byte subsequences of content of short message are cooked hash, an envelope short message corresponding the set of a hashed value, this method is more effective to the bigger hash object of length.Consider content of short message length less (maximum length is 140 bytes), for guaranteeing readability, the content of short message that in a period of time, sends can change at random, so select first kind of ashing technique in the present invention for use.The codomain that will guarantee hashed value simultaneously is enough big, can the original short message of unique representative.If two hashed values do not wait, then the original short message of their representatives is different; If two hashed values are identical, then the different probability of the original short message of their representatives is minimum.
But from recent statistical conditions, the junk short message content that most of mass-sending device produces has following characteristic:
1) length is generally greater than 80 bytes.
2) the rubbish sender adds random character at head, the afterbody of short message, guarantees that every envelope content of short message has the deviation of several bytes.
Rubbish sender's main purpose is to let the user can read their short message, and content of short message is extremely short, so the rubbish sender can only carry out limited modification to content of short message.Usual way is exactly to add the character string (like letter, numeral) that can increase automatically in the rostral-caudal of short message; If completely random produces content of short message; Nobody can know which kind of content short message expresses, and junk short message sender's purpose does not reach so.
Obviously, whole content of short message is done the method for hash and can't be handled this type of short message, will propose following thinking head it off among the present invention.
1), accomplish operation as follows for the short message of length greater than the second threshold k byte:
When an envelope short message arrives; At first whole content of short message is handled; Confirm the position s of first Chinese character of short message and the end position e of last Chinese character; Make up one with the content of short message subclass that position s begins, position e finishes, and it is carried out hash computations obtain its corresponding hashed value.The query contents table if this hashed value occurs for the first time, is then preserved in contents table and should be worth, and promptly in contents table, sets up a new short message class, and number of short in such is designated as 1; If the corresponding short message class of this hashed value Already in the contents table, then increases 1 with number of short in such;
2) be less than or equal to the short message of K byte for length, directly it carried out hash operations, obtain its corresponding hashed value.The query contents table if this hashed value occurs for the first time, is then preserved in contents table and should be worth, and promptly in contents table, sets up a new short message class, and number of short in such is designated as 1; If the corresponding short message class of this hashed value Already in the contents table, then increases 1 with number of short in such.
If the first threshold f that number of short surpass to be set in certain short message class in the contents table 0, and the length of short message is greater than the K byte, and then this short message class is labeled as suspicious short message class.
In the present invention, use hashing algorithm to be the MD5 algorithm, the K value is 80 bytes, for the short message of length smaller or equal to 80 bytes, hash to as if whole short message.For short message greater than 80 bytes, hash to as if from first Chinese character to a last content of short message that Chinese character comprised, what preserve in the contents table is corresponding hashed value.
If current short message is sorted into suspicious short message class; The calling number that this short message is corresponding writes directory; If this number occurs for the first time; Then in directory, preserve and should be worth, the short message hashed value that this number is corresponding writes hashed value set field in the directory, simultaneously with the set of numbers field of this number record in contents table short message class.If this number exists; Whether the hashed value that has then existed in the hashed value of more current short message and the directory memory cell is identical; If it is different; Then new hashed value is added in the corresponding memory cell, and with the calling number set field of this number record in contents table short message class; Otherwise do not do any operation.
If current short message is not sorted into suspicious short message class,, then do not do any operation if this calling number occurs for the first time; If this calling number occurs non-for the first time, whether the hashed value that has existed in the hashed value of more current short message and the directory is identical, if different, then in the memory cell with new hashed value adding correspondence, if identical, do not do any operation.
Send mode statistical module 102 is used to discern junk short message.
If duplicate short message quantity surpasses the 3rd threshold value f in certain suspicious short message class 1, then send mode statistical module 102 obtains corresponding all the calling number set of this short message class, in directory, these calling numbers is added up, if the hashed value quantity that these numbers produce and the difference of calling number quantity are not more than the 4th threshold value f 2, then should suspicious class be labeled as the junk short message class, and all calling numbers of this short message class were submitted to calling number sending module 103.
Calling number sending module 103 is sent to short message service center with calling number, is used to supply short message service center to filter junk short message.
Administration module 104 will regularly be deleted the normal short message class in the contents table, to guarantee the availability of internal memory.Usually, administration module 104 is in idle state, and is every at a distance from one-period t, normal short message class in the automatic scavenge system of administration module.
The present invention can guarantee that the junk short message that short message service center receives is discerned real-time and efficiently, classified and handles, and realizes the real-time blocking to junk short message.The present invention is at first according to a large amount of characteristics that repeat of mass-sending junk short message, designs the effective hashing algorithm content that saves short message, and on this basis the short message flow carried out content and gather classification, makes the ONLINE RECOGNITION junk short message become possibility.The present invention further uses calling number to send behavioural information, thereby effectively the junk short message that device produces is mass-sended in identification, and the short message that certain body and function family is sent in batches can not produce wrong report.
Flow process provided by the invention is as shown in Figure 3, wherein parameter f 0Value is 100, parameter f 1Value is 1000, parameter f 2Value is 0 (also can get other value, for example 1,2 ,-1 ,-2 etc.), comprises the steps:
Step 301, initialization makes up the contents table of storage hashed value and the directory of number storing information.Receive a new short message.
Whether step 302 judges short message length greater than the K byte, if execution in step 303, otherwise execution in step 304.
Step 303, confirm short message first, last Chinese character position, execution in step 304.
Step 304 is less than or equal to the short message of K byte for short message length, directly calculates the hashed value of received new short message; For the short message of short message length, the content that this short message comprises from first Chinese character to a last Chinese character is calculated hashed value greater than the K byte.
Step 305 judges whether hashed value is present in the contents table, if, execution in step 306, otherwise execution in step 307;
Step 306 is revised contents table, and the number of times that hashed value is occurred increases by 1, execution in step 308;
Step 307 adds contents table with hashed value, and the number of times that hashed value is occurred is designated as 1;
Step 308 judges that the hashed value occurrence number is greater than f 0, whether short message length is set up greater than the K byte simultaneously simultaneously, if execution in step 309, otherwise execution in step 312;
Step 309 judges whether this number occurs for the first time, if execution in step 310, otherwise execution in step 311;
Step 310 writes directory with calling number and corresponding hashed value, and calling number is inserted the calling number set field in the contents table, execution in step 314;
Step 311 writes directory with the hashed value that in directory, does not write down, and calling number is inserted the calling number set field in the contents table, execution in step 314;
Step 312 judges whether this number occurs for the first time, if execution in step 317, otherwise execution in step 313;
Step 313 writes directory with the hashed value that in directory, does not write down, execution in step 317;
Step 314 is if duplicate short message quantity surpasses threshold value f in certain short message class 1, then forward step 315 to; Otherwise forward step 317 to;
Step 315 judges that the hashed value quantity of all calling numbers generations and the difference of calling number quantity are not more than the 4th threshold value, if above-mentioned condition is set up execution in step 316, otherwise execution in step 317;
Step 316 judges that then the corresponding short message class of this short message is the junk short message class, and the calling number that this short message is corresponding is sent to short message service center, is used to supply short message service center to filter junk short message, execution in step 317;
Step 317 finishes current short message work of treatment, prepares to receive next envelope short message.
Fig. 4 has described network configuration of the present invention, and the rubbish short message recognition system among the present invention is connected (being that the rubbish short message recognition system bypass is in short message service center) as independent network element with short message service center.Rubbish short message recognition system can obtain the mirror image of short message flow in the short message service center from short message service center, does not influence short message service center's normal handling work; Simultaneously, in case after finding junk short message, rubbish short message recognition system can the calling number that junk short message is corresponding pass to short message service center, in time the junk short message of the follow-up transmission of interception mass-sending device.
Certainly, the system among the present invention also can be embodied directly in the form of software module in short message service center, and the short message flow of process is discerned.
Those skilled in the art can also carry out various modifications to above content under the condition that does not break away from the definite the spirit and scope of the present invention of claims.Therefore scope of the present invention is not limited in above explanation, but confirm by the scope of claims.

Claims (17)

1. based on the junk short message recognition methods of the behavior of transmission, it is characterized in that, comprising:
Step 1 is calculated the hashed value of the short message receive, and the quantity of the short message with identical content that has sent according to this hashed value record and the length of content of short message;
Step 2; If the quantity of the short message with identical content that has sent reaches first threshold; And the length of content of short message then writes down the calling number of this short message greater than second threshold value, and writes down the corresponding relation of the hashed value of this calling number and this short message; Otherwise, if the calling number of short message occurs for the first time, then do not do any operation, if this calling number occurs the non-first time, then write down the corresponding relation of the hashed value of this calling number and this short message;
Step 3; If the length with identical content and content of short message of having sent reaches the 3rd threshold value greater than the quantity of the short message of second threshold value, then obtain this length greater than all corresponding calling numbers of the short message of second threshold value with identical content and content of short message; If the difference of the quantity of the different hashed values that these all calling numbers are corresponding and the quantity of these all calling numbers is not more than the 4th threshold value, judge that then this short message with identical content is a junk short message,
The hashed value of wherein calculating the short message that receives comprises:
If the position of first Chinese character of short message that the length of the content of short message that receives greater than second threshold value, is then confirmed to receive and the position of last Chinese character, and according to calculating hashed value to a last content that Chinese character comprised from first Chinese character; Perhaps
If the length of the content of short message that receives is less than or equal to second threshold value, then directly the content of the short message that receives is calculated hashed value.
2. junk short message recognition methods as claimed in claim 1 is characterized in that, in the step 1, directory and contents table is set also;
Directory is used to write down the calling number of short message, and the corresponding hashed value set of short message of sending of this calling number;
Contents table is used to write down the hashed value of short message, and the length of content of short message has the quantity of the short message of identical content, sends all calling numbers set of the short message with identical content;
The quantity of the hashed value of the short message that receives, the short message with identical content that sent and the length records of content of short message are in contents table.
3. junk short message recognition methods as claimed in claim 2 is characterized in that step 2 comprises:
Step 41 judges whether to meet the following conditions: the quantity of the short message with identical content that has sent reaches first threshold, and the length of content of short message is greater than second threshold value; If, execution in step 42, otherwise execution in step 43;
Step 42 writes directory with the calling number of the short message that receives: if this calling number occurs for the first time, then in directory, preserve this calling number hashed value corresponding with this short message, and this calling number is recorded in the contents table; If this number exists; Whether the corresponding hashed value of this calling number is identical in the hashed value that then relatively this short message is corresponding and the directory: if difference; Then that this short message is corresponding hashed value is recorded in the directory; And with this number record in contents table, if identical, then do not do any operation;
Step 43 if the calling number of the short message that receives occurs for the first time, is not then done any operation; If this calling number occurs the non-first time; Whether the corresponding hashed value of this calling number is identical in the corresponding hashed value of the short message that then relatively receives and the directory: if difference; Then that this short message is corresponding hashed value is recorded in the directory, if identical, does not then do any operation.
4. junk short message recognition methods as claimed in claim 2 is characterized in that, contents table also is used to write down the short message generic attribute, and the short message generic attribute comprises suspicious short message class, junk short message class, and normal short message class.
5. junk short message recognition methods as claimed in claim 4; It is characterized in that; In the step 2; If the quantity of the short message with identical content that has sent reaches first threshold, and the length of content of short message is greater than second threshold value, and then this corresponding short message generic attribute of short message with identical content is labeled as suspicious short message class in contents table.
6. junk short message recognition methods as claimed in claim 5 is characterized in that, in the step 3, also that the hashed value of junk short message is corresponding short message generic attribute is labeled as the junk short message class.
7. junk short message recognition methods as claimed in claim 1 is characterized in that, also comprises step 4, and the calling number of junk short message is sent to short message service center, is used to supply short message service center to filter junk short message.
8. junk short message recognition methods as claimed in claim 4 is characterized in that, also comprises step 5, the normal short message class in the table that regularly clears contents.
9. based on the rubbish short message recognition system of the behavior of transmission, it is characterized in that, comprising:
Send content processing module, be used to calculate the hashed value of the short message that receives, and write down the quantity of the short message that has sent and the length of content of short message with identical content according to this hashed value; If the quantity of the short message with identical content that has sent reaches first threshold, and the length of content of short message then writes down the calling number of this short message greater than second threshold value, and writes down the corresponding relation of the hashed value of this calling number and this short message; Otherwise, if the calling number of short message occurs for the first time, then do not do any operation, if this calling number occurs the non-first time, then write down the corresponding relation of the hashed value of this calling number and this short message;
The send mode statistical module; Be used for when the length with identical content and content of short message of having sent reaches the 3rd threshold value greater than the quantity of the short message of second threshold value, then obtain this length greater than all corresponding calling numbers of the short message of second threshold value with identical content and content of short message; If the difference of the quantity of the different hashed values that these all calling numbers are corresponding and the quantity of these all calling numbers is not more than the 4th threshold value, judge that then this short message with identical content is a junk short message,
The hashed value of wherein calculating the short message that receives comprises:
If the position of first Chinese character of short message that the length of the content of short message that receives greater than second threshold value, is then confirmed to receive and the position of last Chinese character, and according to calculating hashed value to a last content that Chinese character comprised from first Chinese character; Perhaps
If the length of the content of short message that receives is less than or equal to second threshold value, then directly the content of the short message that receives is calculated hashed value.
10. the rubbish short message recognition system based on the behavior of transmission as claimed in claim 9 is characterized in that, sends content processing module, also is used to be provided with directory and contents table;
Directory is used to write down the calling number of short message, and the corresponding hashed value set of short message of sending of this calling number;
Contents table is used to write down the hashed value of short message, and the length of content of short message has the quantity of the short message of identical content, sends all calling numbers set of the short message with identical content;
The quantity of the hashed value of the short message that receives, the short message with identical content that sent and the length records of content of short message are in contents table.
11. the rubbish short message recognition system based on the behavior of transmission as claimed in claim 10 is characterized in that,
Send content processing module, also be used to judge whether to meet the following conditions: the quantity of the short message with identical content that has sent reaches first threshold, and the length of content of short message is greater than second threshold value;
If: the calling number of the short message that receives is write directory, if this calling number occurs for the first time, then in directory, preserve this calling number hashed value corresponding, and this calling number is recorded in the contents table with this short message; If this calling number exists; Whether the corresponding hashed value of this calling number is identical in the hashed value that then relatively this short message is corresponding and the directory: if difference; Then that this short message is corresponding hashed value is recorded in the directory; And this calling number is recorded in the contents table, if identical, then do not do any operation;
Otherwise: if the calling number of the short message that receives occurs for the first time, then do not do any operation; If this calling number occurs the non-first time; Whether the corresponding hashed value of this calling number is identical in the corresponding hashed value of the short message that then relatively receives and the directory: if difference; Then that this short message is corresponding hashed value is recorded in the directory, if identical, does not then do any operation.
12. the rubbish short message recognition system based on the behavior of transmission as claimed in claim 10 is characterized in that contents table also is used to write down the short message generic attribute, the short message generic attribute comprises suspicious short message class, junk short message class, and normal short message class.
13. the rubbish short message recognition system based on the behavior of transmission as claimed in claim 12; It is characterized in that; Send content processing module; Also be used for reaching first threshold, and the length of content of short message is labeled as suspicious short message class with this corresponding short message generic attribute of short message with identical content during greater than second threshold value in contents table in the quantity of the short message that has sent with identical content.
14. the rubbish short message recognition system based on the behavior of transmission as claimed in claim 13 is characterized in that, the send mode statistical module also is used for being labeled as the junk short message class at the contents table short message generic attribute that the hashed value of junk short message is corresponding.
15. the rubbish short message recognition system based on the behavior of transmission as claimed in claim 9; It is characterized in that; Also comprise: the calling number sending module, be used for the calling number of this junk short message is sent to short message service center, filter junk short message for short message service center.
16. the rubbish short message recognition system based on the behavior of transmission as claimed in claim 12 is characterized in that, also comprises administration module, is used for the normal short message class that regularly clears contents and show.
17. a GSM comprises short message service center, it is characterized in that, also comprises the rubbish short message recognition system based on the behavior of transmission as claimed in claim 9;
Described based on the rubbish short message recognition system bypass of sending behavior in short message service center, perhaps be arranged in the short message service center.
CN2008102242531A 2008-10-14 2008-10-14 Rubbish short message recognition system and method based on sending behavior Active CN101389085B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008102242531A CN101389085B (en) 2008-10-14 2008-10-14 Rubbish short message recognition system and method based on sending behavior

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008102242531A CN101389085B (en) 2008-10-14 2008-10-14 Rubbish short message recognition system and method based on sending behavior

Publications (2)

Publication Number Publication Date
CN101389085A CN101389085A (en) 2009-03-18
CN101389085B true CN101389085B (en) 2012-03-21

Family

ID=40478201

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008102242531A Active CN101389085B (en) 2008-10-14 2008-10-14 Rubbish short message recognition system and method based on sending behavior

Country Status (1)

Country Link
CN (1) CN101389085B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096703B (en) * 2010-12-29 2013-06-12 北京新媒传信科技有限公司 Filtering method and equipment of short messages
CN102982048B (en) * 2011-09-07 2017-08-01 百度在线网络技术(北京)有限公司 A kind of method and apparatus for being used to assess junk information mining rule
CN102368842B (en) * 2011-10-12 2013-03-20 中国联合网络通信集团有限公司 Detection method of abnormal behavior of mobile terminal and detection system thereof
WO2015054993A1 (en) * 2013-10-18 2015-04-23 中兴通讯股份有限公司 Method and device for processing spam information
CN106454818A (en) * 2015-08-06 2017-02-22 中国移动通信集团四川有限公司 Data information service credit control method and data information service credit control device
CN106452856A (en) * 2016-09-28 2017-02-22 杭州鸿雁智能科技有限公司 Traffic flow statistics method and device, and wireless access equipment with traffic flow statistics function
CN108243142A (en) * 2016-12-23 2018-07-03 阿里巴巴集团控股有限公司 Recognition methods and device and anti-spam content system
CN108684032B (en) * 2018-03-30 2021-05-18 Oppo广东移动通信有限公司 Interception setting method and related equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101137087A (en) * 2007-08-01 2008-03-05 浙江大学 Short message monitoring center and monitoring method
CN101150762A (en) * 2007-11-06 2008-03-26 中国移动通信集团江苏有限公司 A spam real time interception method and system
WO2008053426A1 (en) * 2006-10-31 2008-05-08 International Business Machines Corporation Identifying unwanted (spam) sms messages
CN101257671A (en) * 2007-07-06 2008-09-03 浙江大学 Method for real time filtering large scale rubbish SMS based on content

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008053426A1 (en) * 2006-10-31 2008-05-08 International Business Machines Corporation Identifying unwanted (spam) sms messages
CN101257671A (en) * 2007-07-06 2008-09-03 浙江大学 Method for real time filtering large scale rubbish SMS based on content
CN101137087A (en) * 2007-08-01 2008-03-05 浙江大学 Short message monitoring center and monitoring method
CN101150762A (en) * 2007-11-06 2008-03-26 中国移动通信集团江苏有限公司 A spam real time interception method and system

Also Published As

Publication number Publication date
CN101389085A (en) 2009-03-18

Similar Documents

Publication Publication Date Title
CN101389085B (en) Rubbish short message recognition system and method based on sending behavior
CN107566358B (en) Risk early warning prompting method, device, medium and equipment
US12081696B2 (en) System and method for determining unwanted call origination in communications networks
CN103415004B (en) A kind of method and device detecting junk short message
CN103067896B (en) Method for filtering spam short messages and device
CN101686444B (en) System and method for detecting spam SMS sender number in real time
CN101335920A (en) Rubbish short message recognition system and method based on calling number location and transmitted content
CN103763690A (en) Method and device for sending short messages to mobile terminal from detection fake base station
CN101860822A (en) Method and system for monitoring spam messages
CN103391547A (en) Information processing method and terminal
CN102802133A (en) Junk information identification method, device and system
CN101909261A (en) Method and system for monitoring spam
CN102368842A (en) Detection method of abnormal behavior of mobile terminal and detection system thereof
CN113412607B (en) Content pushing method and device, mobile terminal and storage medium
CN103888919A (en) Short message monitoring method and device thereof
CN101217595A (en) A processing method and device of incoming communication method
KR20170006158A (en) System and method for detecting fraud usage of message
CN109104429B (en) Detection method for phishing information
CN108924840B (en) Blacklist management method and device and terminal
CN101370298B (en) Method for improving call transfer reliability
CN102231874A (en) Short message processing method, device and system
CN114363839A (en) Fraud data early warning method, device, equipment and storage medium
Mosquera et al. On detecting messaging abuse in short text messages using linguistic and behavioral patterns
CN108513269B (en) Advertisement short message identification method, electronic device, computer equipment and storage medium
KR102321584B1 (en) System for providing message sending service using safe keyword

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant