CN101686444B - System and method for detecting spam SMS sender number in real time - Google Patents

System and method for detecting spam SMS sender number in real time Download PDF

Info

Publication number
CN101686444B
CN101686444B CN 200810168774 CN200810168774A CN101686444B CN 101686444 B CN101686444 B CN 101686444B CN 200810168774 CN200810168774 CN 200810168774 CN 200810168774 A CN200810168774 A CN 200810168774A CN 101686444 B CN101686444 B CN 101686444B
Authority
CN
China
Prior art keywords
rule
center node
spam sms
sms sender
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 200810168774
Other languages
Chinese (zh)
Other versions
CN101686444A (en
Inventor
王晨
李洁
陆薇
田启明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to CN 200810168774 priority Critical patent/CN101686444B/en
Publication of CN101686444A publication Critical patent/CN101686444A/en
Application granted granted Critical
Publication of CN101686444B publication Critical patent/CN101686444B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a system and a method for detecting a spam SMS sender number in real time. The system comprises an event handling engine and a chart analyzing engine, wherein the event handling engine is used for acquiring a SMS event flow in real time, carrying out rule matching according to a preset time parameter and the rule thereof, and extracting a potential spam SMS sender number; and the chart analyzing engine is used for receiving the number extracted by the event handling engine, then acquiring the social network data of the number, and carrying out spatial behavioral pattern analysis by combining at least one preset spatial feature and the parameter and rule thereof, so as to determine whether the number is the real spam SMS sender number; thus the real spam SMS sender can be exactly detected in real time through an implementation mode.

Description

Spam SMS sender number real-time detecting system and method
Technical field
The application relates to a kind of spam SMS sender number real-time detecting system and method.
Background technology
Refuse messages (Spam Short Message) is defined as: have the batch note that is sent to the illegal or ad content of telecommunications terminal user by particular organization/individual, the Mobile Directory Number of said telecommunications terminal user is perhaps obtained in the fixed number district unusually.Refuse messages is just being harassed common mobile phone user's daily life.The useful solution that does not address this problem promptly detects the effective ways of spam SMS sender number, and each mobile phone will become dustbin, and the terminal use possibly obtain Useful Information from it hardly.Another serious problem is: the refuse messages sender has preserved a large amount of mobile phone users' personal information, and this brings the personal secrets problem to the client.On the other hand, the refuse messages sender always attempts illegal the use provides the telecommunication service tariff package of dog-cheap telecommunication service price, thereby has had a strong impact on the normal income of telecom operators.
Telecom operators are the characteristic of tried analyzing refuse messages, comprising: there be (on average being less than 3 months) in spam SMS sender number in the short time, and the defaulting subscriber leads high relatively (being higher than 41%).Simultaneously,, be about to the sender that the normal users mistake is judged to be refuse messages, with causing more serious customer complaint if produce erroneous judgement.Therefore, detecting spam SMS sender number needs in real time with correct.
Operator has adopted the solution of several detection spam SMS sender numbers, like the message number that number is sent in real-time supervision, keeps watch on message context etc.But all these solutions all have its restriction.There is a solution in China Mobile: the threshold value of 1 minute at interval interior message quantity forwarded is set, if the sender sent the number of messages more than threshold value in 1 minute, the sender will be assumed that the refuse messages sender so, and its number will be locked.This solution can be discerned the true refuse messages sender of some type, but as the Spring Festival of China so in particular cases, domestic consumer possibly use note transmit to wish to its all friends.Simply can domestic consumer be misinterpreted as the refuse messages sender and lock their number, cause the economic loss of operator based on the detection rule of threshold value.In addition, increasing real refuse messages sender tends to use like the normal transmission behavior of the automatic message transmitting system imitation domestic consumer of note cat and is detected avoiding.Need more matched rule based on the supervision of message content, on the one hand, the sender of refuse messages can obscure, add mode such as additional character through content and escape supervision, such as " invoice " being written as " a * ticket " or " fa ticket ".In addition, because text retrieval is consuming time longer with coupling, under the situation for the larger data amount, this is inefficient basically.
The Business Intelligence system of many operators has made up the potential defaulting subscriber that some technology based on data mining detect ISP's fraud, have unusual bill.All these technology are used for off-line system, and the periodic data Collection and analysis can not be guaranteed real-time detection.
Summary of the invention
In realizing an embodiment of the invention; A kind of spam SMS sender number real-time detecting system is provided, comprises: the event handling engine is used for obtaining in real time short message event stream; Carry out rule match according to preset time parameter and rule thereof, extract potential spam SMS sender number; And map analysis engine; Be used to receive the said number that the event handling engine extracts; Obtain the community network data of said number then; In conjunction with at least one predetermined space characteristics and parameter and rule, carry out the spatial behavior pattern analysis, so that judge the spam SMS sender number that said number is whether real.
In realizing another embodiment of the invention; A kind of spam SMS sender number real-time detection method is provided; Comprise: obtain short message event stream in real time, carry out rule match, extract potential spam SMS sender number according to preset time parameter and rule thereof; And the community network data of obtaining the said number of extraction, in conjunction with at least one predetermined space characteristics and parameter and rule, carry out the spatial behavior pattern analysis, so that judge the spam SMS sender number that said number is whether real.
According to an embodiment of the invention, can produce following advantage:
1, can supply the knowledge of operator based on the pattern recognition of data mining, set up simple decision rule because current spam SMS sender number detection system can only depend on the experience of operator.
2, through the online detection of time-based potential spam SMS sender number of flow of event, existing flow data treatment system capable of using provides effective detection solution.
3, the user's community network figure based on the space further helps accurately to filter potential refuse messages sender; To discern real refuse messages sender; This has realized the correctness requirement of telecom operators, thereby when avoiding potential economic loss, has improved its income.
4, detect real refuse messages sender in real time and correctly.
Description of drawings
Fig. 1 illustrates the system architecture according to an embodiment of the invention.
Fig. 2 A and Fig. 2 B illustrate subgraph once.
Fig. 3 A and Fig. 3 B illustrate two degree subgraphs.
Fig. 4 A and Fig. 4 B illustrate the flow chart according to the method for an embodiment of the invention.
Embodiment
A main thought of the present invention provides a kind of new solution that effectively also correctly detects real spam SMS sender number.In an embodiment of the invention, in order to realize the spam SMS sender number testing goal, designed time and the supervision of space characteristics and the layer architecture of analysis of integrated user behavior, it can be divided into off-line part and online part.
Refuse messages is sent by machine usually automatically, and its sending mode meets some rules really aspect statistics relatively for a long time.With the transmission number of each time point of each short message service user with send object and send at interval and frequency, be recorded as incident (being the note summary), can be sequence of events with sending behavior description.Utilize the experience and the data mining technology of operator, some rules of definable are also implemented in sms center, so that to sequence of events executing rule coupling, thereby find out potential spam SMS sender number.Therefore, the real-time detection of spam SMS sender number is useful.But have a problem: some normal senders possibly have identical behavior at certain time point of certain time durations, and for example, the person of active organization possibly give notice for simultaneously a plurality of users, and this is as the refuse messages behavior.Therefore, further checking is important so that avoid the complaint of normal users.
Online part promptly is used for the online detection subsystem that spam SMS sender number is kept watch on and analyzed, and can further be divided into two sub-layer: low layer Time-Series analysis engine and high-level diagram analysis engine.Low layer Time-Series analysis engine is based on the user behavior pattern analyzer of time, and it uses from the flow of event of the record retrieval of sms center and finds potential spam SMS sender number in real time.The high-level diagram analysis engine is based on user's community network pattern analyzer in space, and it further uses network diagram to filter potential spam SMS sender number with the real spam SMS sender number of accurate detection from low layer.
The off-line part; Be that off-line excavates subsystem; Mainly concentrate on historical data analysis; Through using data mining technology, can excavate refuse messages sender behavior pattern from the long-time statistical record and help operator's identification and confirm special pattern, to assist to set up the decision rule of high-rise online detection subsystem.
Followingly execution mode of the present invention is carried out detailed explanation with reference to accompanying drawing.
Fig. 1 illustrates the system architecture according to an embodiment of the invention, wherein also illustrates the realization environment of this execution mode.The system of this execution mode comprises that online detection module 1 and off-line mode excavate module 2.
Online detection module 1 comprises event handling engine 11 and map analysis engine 12.
The rule match that event handling engine 11 is carried out based on flow of event.The note of all transmissions can be through event handling engine 11, and flows processing in real time for each incident of passing through, promptly carries out rule match according to preset time parameter and rule (back specifies).All potential refuse messages are captured and the transmission number of these notes will be reported to map analysis engine 12 and further verify.At this, event handling engine 11 is analyzed the inherent law of refuse messages senders' time behavior, carries out Time-Series analysis, i.e. time BMAT extracts the transmission number of matched rule and reports to map analysis engine 12.
Map analysis engine 12 is carried out the network diagram analysis, and promptly the community network to MPTY or note sender carries out spatial behavior pattern analysis (back is detailed).Map analysis engine 12 is with the calling of analysis user or the topological structure of short message service figure (telecommunity network).
Through specific telecommunications connection features, like MPTY or note record, the connection in a time period between the mobile phone user will make up oriented network figure, node representative of consumer wherein, and this is called the telecommunity network.Detect for the refuse messages sender, this oriented network graph topological structure described should be during the time period the user's space behavior.Can extract statistical nature from the telecommunity network, and analyze this statistical nature and help detect refuse messages sender behavior.As the spatial statistics characteristic, be difficult to hide to the refuse messages behavior of target receiver.
The analytic process of above-mentioned telecommunity topology of networks can be divided into three progressively deep steps, and data that all need come from the detailed unirecord system of operator.Below be that example describes with the short message service.
In the first step, carry out subgraph analysis once.Shown in Fig. 2 A and Fig. 2 B, once subgraph represented that MPTY or note sender (the center node among the figure) and its contacted directly the contact behavior between people's (direct-connected all nodes of among the figure and center node).Wherein, Fig. 2 A illustrates potential refuse messages sender's subgraph once; Center node among the figure is represented potential refuse messages sender; Connect this center node and the outstanding limit that shows, identify this potential refuse messages sender contacts directly people (potential victim) with it contact behavior, the arrow on each limit has been indicated sending direction.As contrast; Fig. 2 B illustrates the subgraph once of normal short-message users; Center node among the figure is represented specific normal short-message users (sender of concern); Connect this center node and the outstanding limit that shows, identify this specific normal short-message users is contacted directly the people with it the behavior of getting in touch, the arrow on each limit has been indicated sending direction.Therefore, once subgraph has been showed current sender's space relationship behavior.Visible through comparison diagram 2A and Fig. 2 B: for most of normal short-message users (sender of concern); In the limit that connects the center node; The out-degree (expression is sent) and the in-degree (expression receives) of being indicated by the arrow on limit are balance basically, and promptly sending behavior and reception behavior is balance basically; But; For potential refuse messages sender, in the limit that connects the center node, the out-degree (expression is sent) of being indicated by the arrow on limit is far longer than in-degree (expression receives); Promptly mainly be a large amount of transmission behaviors, because its purpose is to send refuse messages but not proper communication.
In second step, further carry out two degree subgraph analyses.Shown in Fig. 3 A and Fig. 3 B; Two degree subgraphs are contacted directly the contact behavior between people's (direct-connected all nodes of among the figure and center node) except show Calls side or note sender (the center node among the figure) and its, further show these and contact directly the contact behavior between the people.That is, two degree subgraphs can show potential refuse messages sender all contact directly the communication behavior between the people.Wherein, Fig. 3 A illustrates potential refuse messages sender's two degree subgraphs; Center node among the figure is represented potential refuse messages sender; Connect this center node and the outstanding limit that shows, identify this potential refuse messages sender contacts directly people (potential victim) with it contact behavior, the arrow on each limit has been indicated sending direction.In Fig. 3 A, representing there is not the limit basically between these nodes of contacting directly the people, the outstanding limit that shows seems to remain starlike.As contrast; Fig. 3 B illustrates two degree subgraphs of normal short-message users; Center node among the figure is represented specific normal short-message users (being paid close attention to the sender); Connect this center node and the outstanding limit that shows, identify this specific normal short-message users is contacted directly the people with it the behavior of getting in touch, the arrow on each limit has been indicated sending direction.In addition, in Fig. 3 B, representing also possibly to have the limit of being with arrow between these nodes of contacting directly the people, the outstanding limit that shows demonstrates netted.Visible through comparison diagram 3A and Fig. 3 B: for normal short-message users (being paid close attention to the sender); It is contacted directly and possibly have a plurality of contacts (limit that the outstanding demonstration of netted development in Fig. 3 B, occurs) between the people, because normal contacting directly between the people of short-message users also possibly be friend usually; But; For potential refuse messages sender; Note recipient's (contacting directly the people) be select at random or from its object listing, thereby seldom possibly have contact (limit of the outstanding demonstration of netted development in Fig. 3 A, seldom occurs, still appear starlike) between these notes recipient.
In the 3rd step, further carry out three degree subgraph analyses, i.e. the contact behavior of people's profound community network is contacted directly in consideration, so that come to distinguish more exactly normal short-message users and refuse messages sender through the difference that compares topological structure.Particularly; Three degree subgraphs except show Calls side or note sender (like the center node among the above-mentioned figure) and its contact directly the people (as among the above-mentioned figure and direct-connected all nodes of center node) between contact behavior and these contact directly the contact behavior between the people; Further show these and contacted directly people's deeper contact; Promptly; Also show with each and contact directly the two degree figure that the people is the center node, get in touch with himself the community network of reflection except that the direct communication of contacting directly between the people.
Can carry out in-depth analysis further through expanding aforesaid analysis, thereby introduce the more profound of community network contact.
Through analyzing the spatial behavior pattern (telecommunity network) of short-message users, accuracy that can be high is relatively discerned real refuse messages sender, thereby has remedied the deficiency (low accuracy) of time behavior pattern analysis.The analysis space behavior pattern need make up note sender's community network figure, and this is more difficult and consuming time than flow of event analysis, and therefore, the spatial behavior pattern analysis is inappropriate for and handles a large amount of quilt concern senders simultaneously.
Off-line mode excavates module 2 and comprises transaction manager 21 and mode excavation engine 22.
Transaction manager 21 is carried out the data preliminary treatment.As shown in Figure 1, all historical short message service data are filtered through data access layer 4 entering transaction manager 21 from charge system 5 and business intelligence (BI) system 6, that is, only relevant field is retained in all short message services records.Said relevant field includes but not limited to following attribute: sender ID (identifier), recipient ID and transmitting time.After this, these attributes are summarized as and are used for the further characteristic of study.The data record that produces will be summarized the behavior of SMS sender number in certain minimum time section (as apart from first note 10 seconds occurring).The field of the data that produce includes but not limited to: amount, the frequency of transmission and the variation of frequency of the total amount of sender ID, different recipient's numbers, transmission note.All these key elements form characteristic, are also referred to as the incident that note is sent.
Mode excavation engine 22 carries out mode excavation and produces characteristic, parameter and rule.Mode excavation engine 22 uses suitable cluster (clustering) algorithm with the incident cluster from transaction manager 21 reception incidents, with the event set of output cluster.Operator can be visual with result's (attribute) of the incident of cluster, uses these knowledge then and find out or verify spam SMS sender number and the behavior pattern in certain type thereof, be i.e. rule.That is to say that for the spatial behavior pattern, mode excavation engine 22 utilizes machine learning method, and all characteristics are carried out cluster analysis and checking, find effective character subset and draw suitable parameter and rule; And for the time behavior pattern, as sending frequency, quantity forwarded etc., mode excavation engine 22 utilizes machine learning method analysis, draws suitable parameter and rule.In addition, operator also can artificial increase or revise characteristic, parameter and rule, as according to particular requirement.
The characteristic that relates in the spatial behavior pattern analysis (space characteristics) has merged the characteristic of subgraph, two degree subgraphs, three degree subgraphs once, comprises but is not limited to following characteristic:
● the in-degree of center node, the note number that expression center event number is received in section sometime;
● the out-degree of center node, the note number of expression center event number transmission in section sometime;
● the in-degree out-degree ratio of center node, expression center event number is being received and the note number in the section sometime
● connect two-way limit proportion in all limits of center node, be illustrated in the note of receiving the center event number in the section sometime and to the ratio of its answer short message (send with receive no sequencing concern);
● connect the average weight on all limits of center node, expression center event number contacts directly to each in section sometime that people's (in this time period, with the center event number number of contacting directly being arranged) sends or the average note number of reception;
● the weight limit in all limits of connection center node, expression center event number is being contacted directly maximum note numbers that people's (implication is the same) sends or receives to all in the section sometime;
● the variance of the weight on all limits of connection center node, expression center event number is being contacted directly the difference distribution degree that people's (implication is the same) sent or received the note number to each in the section sometime;
● all of center node are contacted directly the limit number between people's node, and expression center event number contacting directly in section sometime sent between people's (implication is the same) or the number of reception note each other;
● all of center node are contacted directly the average weight on the limit between people's node, and expression center event number contacting directly in section sometime sent between people's (implication is the same) or the average number of reception note each other; And
● all of center node are contacted directly the weight sum on the limit between people's node, and expression center event number contacting directly in section sometime sent between people's (implication is the same) or the sum of reception note each other.
Explanation to space characteristics is an example with the community network that makes up through the note record above.If the telecommunity network is to make up through the voice call record, can the transmission note in the above-mentioned explanation be replaced with caller phone so, the reception note replaces with and receives calls.
Be characterized as example with in-degree out-degree ratio; Cluster analysis can draw: carry out normal users and refuse messages sender's division with ratio 0.001-0.01; 90% accuracy rate is arranged; So: 0.001-0.01 is called as parameter, if the in-degree out-degree than dropping on that there is 90% probability in the 0.001-0.01 interval then decidable is the refuse messages sender, this is called rule.Can in the spatial behavior pattern analysis, select one of above-mentioned characteristic or its combination as required.
Fig. 4 A and Fig. 4 B illustrate the flow chart according to the method for an embodiment of the invention, and wherein Fig. 4 A illustrates the flow process of online detection, and Fig. 4 B illustrates the flow process that off-line mode excavates.Below with reference to Fig. 1, Fig. 4 A and Fig. 4 B explanation method according to an embodiment of the invention.
Shown in Fig. 4 A, at step S42, event handling engine 11 regularly obtains up-to-date communication data from sms center 3 in real time, i.e. short message event stream.At step S44; In event handling engine 11; Note stream is carried out the rule match based on flow of event; Be about to the short message event flow data and carry out Time-Series analysis, filter out the number of proper communication, extract suspicious number (potential spam SMS sender number) and export to map analysis engine 12 according to above-mentioned parameter and rule from 2 decisions of off-line mode excavation module.At step S46; For each suspicious number; Map analysis engine 12 obtains historical community network data through data access layer 4 from charge system 5, BI system 6 and from related systems such as short message service centers 3, excavates characteristic, parameter and the rule that module 2 obtains in conjunction with above-mentioned from off-line mode, carries out the spatial behavior pattern analysis; Promptly each is paid close attention to the community network that sends number (suspicious number) and carried out synthetic determination, exported the suspicious probability of refuse messages of each suspicious number then.At step S48,, judge whether suspicious number is spam SMS sender number according to the suspicious probability of refuse messages.The business personnel of operator can confirm whether spam SMS sender number of suspicious number according to the suspicious probability of this refuse messages is artificial; Perhaps can judge automatically through setting threshold; Surpass 60% such as suspicious probability, judge that automatically suspicious number is a spam SMS sender number.
Below further specify map analysis engine 12 carries out the spatial behavior pattern analysis at step S46 detailed process.As stated, this process need is carried out synthetic determination to paying close attention to the community network that sends number (suspicious number), and this can pass through accomplished in many ways.For example, can adopt the synthetic determination of rule-based weighted sum pattern:
1. according to the value of being paid close attention to the complete above-mentioned correlated characteristic of community network data computation that sends number;
2. to each characteristic, the value of this characteristic of calculating and the parameter and the rule of this characteristic are compared, obtain probability corresponding to this characteristic;
3. the probability of all characteristics is asked weighted sum, obtain final probability, be i.e. the suspicious probability of refuse messages.
Also can adopt synthetic determination based on the complex classifier of machine learning:
1. utilize off-line data as training sample, train, obtain to be used for the grader that spam SMS sender number detects according to the sorting technique of having selected (grader is like neural net);
2. according to the value of being paid close attention to the complete above-mentioned correlated characteristic of community network data computation that sends number;
3. will import by sequence (vector) conduct that all calculated feature values constitute, and give grader and differentiate;
4. the output of grader is final probability, and promptly the suspicious probability of refuse messages also can be the two-value result of 0 (being spam SMS sender number)/1 (not being spam SMS sender number).
For the processed offline flow process, shown in Fig. 4 B, at step S52, transaction manager 21 is obtained historical communication data through data access layer 4 from charge system 5, BI system 6 and from related systems such as short message service centers 3, comprises calling out and note etc.At step S54, transaction manager 21 is carried out preliminary treatment according to form to communication data, promptly only keeps relevant field, forms characteristic, outputs to mode excavation engine 22.At step S56,22 pairs of pretreated characteristics of mode excavation engine are carried out time and spatial behavior pattern analysis and study.According to analysis result, by time parameter and rule and space characteristics and the parameter and the rule of operator's decision use in online detection module 1.As stated, also manual amendment's parameter and rule as required of operator.
The application has only described specific execution mode of the present invention and realization.According to the content that the application describes, can make various improvement, distortion and other execution mode and realization.
For example; Except shown in system and Fig. 4 A and Fig. 4 B according to an embodiment of the invention shown in Figure 1 according to an embodiment of the invention; Another execution mode according to system of the present invention can only comprise event handling engine 11 and map analysis engine 12, and another execution mode according to the method for the invention can only comprise the online testing process shown in Fig. 4 A.

Claims (10)

1. spam SMS sender number real-time detecting system comprises:
The event handling engine is used for obtaining in real time short message event stream, carries out rule match according to preset time parameter and rule thereof, extracts potential spam SMS sender number; And
The map analysis engine; Be used to receive the said number that the event handling engine extracts; Obtain the community network data of said number then,, carry out the spatial behavior pattern analysis in conjunction with at least one predetermined space characteristics and parameter and rule; So that judge the spam SMS sender number that said number is whether real
Wherein said predetermined space characteristics comprise said community network subgraph once, two the degree subgraphs, three the degree subgraphs following characteristic:
The in-degree of center node;
The out-degree of center node;
The in-degree out-degree ratio of center node;
Two-way limit proportion in all limits of connection center node;
The average weight on all limits of connection center node;
The weight limit on all limits of connection center node;
The variance of the weight on all limits of connection center node;
All of center node are contacted directly the limit number between people's node;
All of center node are contacted directly the average weight on the limit between people's node; And
All of center node are contacted directly the weight sum on the limit between people's node.
2. spam SMS sender number real-time detecting system according to claim 1, wherein the map analysis engine carries out the spatial behavior pattern analysis and comprises:
Value according to the said space characteristics of community network data computation of said number; The value of each calculating and the parameter and the rule of this space characteristics are compared; Obtain probability, then the probability of all space characteristics is asked weighted sum, obtain the suspicious probability of refuse messages corresponding to this space characteristics.
3. spam SMS sender number real-time detecting system according to claim 1, wherein the map analysis engine carries out the spatial behavior pattern analysis and comprises:
The community network data of utilizing said number are as training sample; Sorting technique according to selecting is trained; Obtain being used for the grader that spam SMS sender number detects; According to the value of the said space characteristics of community network data computation of said number, the sequence that will be made up of the value of all calculating is then imported said grader and is differentiated, and obtains the suspicious probability of refuse messages.
4. spam SMS sender number real-time detecting system according to claim 1 also comprises:
Transaction manager is used to obtain historical communication data, is characteristic with its preliminary treatment; And
The mode excavation engine is used to receive said characteristic, and said characteristic is carried out time and spatial behavior pattern analysis and study, produces said preset time parameter and rule and said predetermined space characteristics and parameter and rule.
5. spam SMS sender number real-time detecting system according to claim 4, wherein for the spatial behavior pattern, the mode excavation engine carries out cluster analysis and checking to all characteristics, finds effective character subset and draws suitable parameter and rule.
6. spam SMS sender number real-time detection method comprises:
Obtain short message event stream in real time, carry out rule match, extract potential spam SMS sender number according to preset time parameter and rule thereof; And
Obtain the community network data of the said number of extraction,, carry out the spatial behavior pattern analysis in conjunction with at least one predetermined space characteristics and parameter and rule, so that judge the spam SMS sender number that said number is whether real,
Wherein said predetermined space characteristics comprise said community network subgraph once, two the degree subgraphs, three the degree subgraphs following characteristic:
The in-degree of center node;
The out-degree of center node;
The in-degree out-degree ratio of center node;
Two-way limit proportion in all limits of connection center node;
The average weight on all limits of connection center node;
The weight limit on all limits of connection center node;
The variance of the weight on all limits of connection center node;
All of center node are contacted directly the limit number between people's node;
All of center node are contacted directly the average weight on the limit between people's node; And
All of center node are contacted directly the weight sum on the limit between people's node.
7. spam SMS sender number real-time detection method according to claim 6, wherein carry out the spatial behavior pattern analysis and comprise:
Value according to the said space characteristics of community network data computation of said number; The value of each calculating and the parameter and the rule of this space characteristics are compared; Obtain probability, then the probability of all space characteristics is asked weighted sum, obtain the suspicious probability of refuse messages corresponding to this space characteristics.
8. spam SMS sender number real-time detection method according to claim 6, wherein carry out the spatial behavior pattern analysis and comprise:
The community network data of utilizing said number are as training sample; Sorting technique according to selecting is trained; Obtain being used for the grader that spam SMS sender number detects; According to the value of the said space characteristics of community network data computation of said number, the sequence that will be made up of the value of all calculating is then imported said grader and is differentiated, and obtains the suspicious probability of refuse messages.
9. spam SMS sender number real-time detection method according to claim 6 also comprises:
Obtaining historical communication data, is characteristic with its preliminary treatment; And
Said characteristic is carried out time and spatial behavior pattern analysis and study, produce said preset time parameter and rule and said predetermined space characteristics and parameter and rule.
10. spam SMS sender number real-time detection method according to claim 9 wherein for the spatial behavior pattern, carries out cluster analysis and checking to all characteristics, finds effective character subset and draws suitable parameter and rule.
CN 200810168774 2008-09-28 2008-09-28 System and method for detecting spam SMS sender number in real time Expired - Fee Related CN101686444B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200810168774 CN101686444B (en) 2008-09-28 2008-09-28 System and method for detecting spam SMS sender number in real time

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200810168774 CN101686444B (en) 2008-09-28 2008-09-28 System and method for detecting spam SMS sender number in real time

Publications (2)

Publication Number Publication Date
CN101686444A CN101686444A (en) 2010-03-31
CN101686444B true CN101686444B (en) 2012-12-26

Family

ID=42049348

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200810168774 Expired - Fee Related CN101686444B (en) 2008-09-28 2008-09-28 System and method for detecting spam SMS sender number in real time

Country Status (1)

Country Link
CN (1) CN101686444B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101909261A (en) * 2010-08-10 2010-12-08 中兴通讯股份有限公司 Method and system for monitoring spam
CN102857921B (en) 2011-06-30 2016-03-30 国际商业机器公司 Judge method and the device of spammer
CN102510400B (en) * 2011-10-31 2015-09-30 百度在线网络技术(北京)有限公司 A kind of method of the suspectableness degree for determining user, device and equipment
CN103686639A (en) * 2013-12-04 2014-03-26 华为技术有限公司 Message processing method, device and system
CN104714947A (en) * 2013-12-11 2015-06-17 深圳市腾讯计算机系统有限公司 Preset type number recognition method and device
CN105516941A (en) * 2014-10-13 2016-04-20 中兴通讯股份有限公司 Interception method and device of spam messages
CN108391240B (en) * 2018-05-23 2021-08-24 中国联合网络通信集团有限公司 Junk multimedia message judgment method and device
CN108769933B (en) * 2018-05-31 2021-06-04 中国联合网络通信集团有限公司 Multimedia message identification method and multimedia message identification system
CN110913353B (en) * 2018-09-17 2022-01-18 阿里巴巴集团控股有限公司 Short message classification method and device
CN111124698A (en) * 2018-10-30 2020-05-08 北京奇虎科技有限公司 Communication event identification method and device, electronic equipment and readable storage medium
CN113839962B (en) * 2021-11-25 2022-05-06 阿里云计算有限公司 User attribute determination method, apparatus, storage medium, and program product

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1520214A (en) * 2003-09-02 2004-08-11 �ź㴫 Firewall system for short message and method for building up firewall
CN1545355A (en) * 2003-11-18 2004-11-10 海信集团有限公司 Method for negative receiving of short message for handset

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1520214A (en) * 2003-09-02 2004-08-11 �ź㴫 Firewall system for short message and method for building up firewall
CN1545355A (en) * 2003-11-18 2004-11-10 海信集团有限公司 Method for negative receiving of short message for handset

Also Published As

Publication number Publication date
CN101686444A (en) 2010-03-31

Similar Documents

Publication Publication Date Title
CN101686444B (en) System and method for detecting spam SMS sender number in real time
CN109600752B (en) Deep clustering fraud detection method and device
CN106550155B (en) Swindle sample is carried out to suspicious number and screens the method and system sorted out and intercepted
CN106791220B (en) Method and system for preventing telephone fraud
CN109451182B (en) Detection method and device for fraud telephone
Becker et al. Fraud detection in telecommunications: History and lessons learned
Barson et al. The detection of fraud in mobile phone networks
CN110248322B (en) Fraud group partner identification system and identification method based on fraud short messages
CN107197463A (en) A kind of detection method of telephone fraud, storage medium and electronic equipment
Wang et al. A behavior-based SMS antispam system
CN101860822A (en) Method and system for monitoring spam messages
CN106970911A (en) A kind of strick precaution telecommunication fraud system and method based on big data and machine learning
CN108881263A (en) A kind of network attack result detection method and system
CN102802133A (en) Junk information identification method, device and system
CN110267272A (en) A kind of fraud text message recognition methods and identifying system
CN101389085B (en) Rubbish short message recognition system and method based on sending behavior
CN110493476B (en) Detection method, device, server and storage medium
CN107819747A (en) A kind of telecommunication fraud correlation analysis system and method based on communication event sequence
CN101909261A (en) Method and system for monitoring spam
CN111131627B (en) Method, device and readable medium for detecting personal harmful call based on streaming data atlas
CN111917574A (en) Social network topology model and construction method thereof, user confidence degree and intimacy degree calculation method and telecommunication fraud intelligent interception system
CN108198086B (en) Method and device for identifying disturbance source according to communication behavior characteristics
CN111105064A (en) Method and device for determining suspected information of fraud event
Alraouji et al. International call fraud detection systems and techniques
CN109587357B (en) Crank call identification method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20121226

Termination date: 20160928

CF01 Termination of patent right due to non-payment of annual fee