WO2012019386A1

WO2012019386A1 - Method and system for monitoring spam short messages

Info

Publication number: WO2012019386A1
Application number: PCT/CN2010/078516
Authority: WO
Inventors: 王飞; 谢钢锋; 邢刚; 冯亚军
Original assignee: 中兴通讯股份有限公司
Priority date: 2010-08-10
Filing date: 2010-11-08
Publication date: 2012-02-16
Also published as: CN101909261A

Abstract

A method for monitoring spam short messages is disclosed in the present invention, which includes: detecting whether the short message sender is a spam short message sender according to the preset rules, if yes, putting said short message sender into a blacklist to monitor spam short messages; said preset rules at least include: in the preset time, if the time sequence characteristic of sending short messages is the preset time sequence characteristic, and/or, if the ratio of the pair number of the short message sender and all his recipients with the mutual communication records to the total pair number, is smaller than the preset value, judging said short message sender to be a spam short message sender. A system for monitoring spam short messages is also disclosed in the present invention. The present invention enables higher search precision rate and search completion rate, therefore increasing the avoiding cost of spam short message producer; and the present invention does not need scanning the content of short messages, therefore making a great improvement in the system performance.

Description

Method and system for monitoring spam messages

TECHNICAL FIELD The present invention relates to short message services in the field of mobile communications, and in particular, to a spam short message monitoring system and method based on sender behavior.

BACKGROUND According to statistics, the number of mobile phone users in China has exceeded 600 million, and an average of more than 650 million text messages are transmitted between the users' thumbs. However, with the popularity of mobile phones and the rapid development of short message services, people are enjoying the fast and convenient means of communication, accompanied by increasingly spam messages. The root cause of spam messages is that the cost of sending SMS messages is extremely low, and the benefits of advertising are very high. However, spam messages not only have an impact on the operators' networks, but also bring great damage to the interests of the users, and even cause serious adverse social impacts. In the governance of spam messages, foreign countries mainly use legislation and advanced technology to identify and deal with fraudulent information and mobile phones, as well as a set of advanced technical means to combat mobile phone crime. In China, the task of preventing and controlling spam messages is mainly led and responsible by operators. Usually, measures are taken from technology and management, and legislation is still lacking. Among the commonly used spam SMS monitoring technologies, the main spam filtering mechanism is the spam filtering mechanism. In principle, it can be divided into black and white list filtering, traffic-based filtering, and keyword-based content filtering. The blacklist-based filtering method is to sort the calling number of the known spam maker into a blacklist, and deploy it in the short message center or the short message gateway to reject the shortlist from the blacklisted calling number. The blacklist can be intercepted by a number segment or number. The whitelisted calling number is not blocked in any way. The traffic-based filtering method collects statistics on the number of bursts sent by the user in a certain period of time. When the burst quantity exceeds a preset threshold, it is manually or automatically added to the blacklist. The keyword-based content filtering method performs keyword query on the mobile phone content. Once hit, the sending number is added to the blacklist. Whether it is based on traffic filtering or keyword-based content filtering, it has its own drawbacks. The traffic-based approach is easy by "sending a small amount of information on multiple phones" The shielding is carried out. At the same time, after many mobile phone terminals implement the group sending function, it is easy to generate a large number of manslaughter messages for the festival greetings, and the user's complaint rate increases. Keyword-based methods can be circumvented by means of "homophones", "typos", "structural splits", and "changes". At present, operators have deployed a large number of garbage monitoring systems. There are two important indicators for evaluating the monitoring effect of a garbage monitoring system: precision and recall. The precision ratio is the proportion of the spam senders in the detected spam send list. The recall rate is the ratio of the number of spammers detected to the actual number of spam senders on the network. . Obviously, a good garbage monitoring system has a high precision and recall rate. At present, the indicators that the operators have deployed based on the above traditional technologies or the improved garbage monitoring system based on the traditional technology are not ideal, and have to rely on a large number of human resources to assist in the inspection of spam messages. Therefore, how to improve the accuracy and recall rate of spam messages has become an urgent problem to be solved.

SUMMARY OF THE INVENTION The technical problem to be solved by the present invention is to provide a method and system for monitoring spam messages to improve the accuracy and recall rate of spam messages. In order to solve the above technical problem, the present invention provides a method for monitoring spam messages, the method comprising: if detecting a sender of a short message as a spam sender according to a predetermined rule, blacklisting the sender of the message to perform garbage The monitoring of the short message includes: the predetermined rule includes: if the timing feature of the short message sent by the short message sender in the predetermined time period is a predetermined time series feature, the short message sender is specified as a spam sender; or if the predetermined time period is If the ratio of the logarithm of the mutual communication record to the total number of pairs of the two or two combinations between the internal short message sender and the other senders of the short message is less than a predetermined value, the short message sender is specified as a spam sender; or The timing feature of the short message sender to send the short message within the predetermined time period is at a predetermined time series feature, and if the sender of the short message has a logarithm of the mutual communication record and the total of the two pairs thereof between the sender and the sender receiving the short message within the predetermined time period If the ratio of the logarithm is less than the predetermined value, the short message sender will be Designed as a spammer. Before the step of detecting the short message sender as the spam sender according to the predetermined rule, the method further includes: extracting a historical short message record of the known spam sender, and obtaining the known spam sending by training from the historical short message record. Transmitting the frequency feature of the short message to train the predetermined time-series feature; and/or connecting the nodes with mutual communication records in the historical short message record to each other to construct the known spam sender and send the short message A social relationship network diagram between all recipients, the predetermined value is trained by the ratio of the number of sides to the total number of sides connected by all nodes. Before the step of detecting the short message sender as the spam sender according to the predetermined rule, the method further includes: detecting that the number of the short message sent by the short message sender in the unit time exceeds a threshold. The step of detecting the short message sender as the spam sender according to the predetermined rule includes: detecting, by the online sender, the short message of the short message in the current time period, and detecting the time series characteristic of the short message sending the short message by the short message sender Determining the timing feature, determining that the short message sender is a spam sender; or detecting the short message of the short message sender in the current time period, if detecting the sender of the short message and all recipients sending the short message If the ratio of the logarithm of the mutual communication record to the total logarithm of the two-two combination is less than the predetermined value, determining that the short message sender is a spam sender; or detecting the short message sender online at the current time a short message, if the timing feature of the short message sent by the sender of the short message is detected as the predetermined time series feature, and if the sender of the short message is detected, the logarithm of the mutual communication record is detected between the sender and the sender And the ratio of the total logarithm of the two-two combination is less than the predetermined value, determining that the short message sender is spam sender. Before the step of detecting the short message sender as the spam sender according to the predetermined rule, the method further includes: Extracting a short message of the short message sender in the current period of time; preprocessing the short message. Before the step of detecting the short message sender as the spam sender according to the predetermined rule, the method further includes: detecting that the short message sender is not on the blacklist and the whitelist. In order to solve the above technical problem, the present invention provides a system for spam monitoring, the system comprising: a detecting module, configured to: if the sender of the short message is detected as a spam sender according to a predetermined rule, the short message sender Blacklisting, and then sending the blacklist to the monitoring module;

The monitoring module is configured to: monitor the spam message according to the blacklist, and the predetermined rule at least includes: if detecting that the time series feature of the short message sender sent by the short message sender is a predetermined time series feature, the short message is sent The sender is specified as a spam sender; or if the ratio of the logarithm of the mutual communication record to the total number of pairs of the two-way combination between the sender of the message and all the recipients who sent the message within the predetermined time period is less than a predetermined value, Or the sender of the short message is specified as a spam sender; or if the time signature of the short message sent by the sender of the message within the predetermined time period is at a predetermined time signature, and if the sender of the message sends all the messages with the message within a predetermined time period If the ratio of the logarithm of the mutual communication record to the total logarithm of the two-to-two combination is less than a predetermined value, the short message sender is specified as the spam sender. The system further includes: a training module, configured to: extract a historical short message record of a known spam sender, and train a clinic by training a frequency profile of a known spam sender to send a short message from the historical short message record Determining a predetermined timing feature, and then transmitting the predetermined timing feature to the detecting module; and/or constructing a node with mutual communication records in the historical short message record a social relationship network diagram between a known spam sender and all recipients that send a text message, training the predetermined value by the ratio of the number of sides to the total number of sides connected between all nodes, and then The predetermined value is sent to the detection module. The detecting module includes: an online detecting module, configured to: detect a short message CDR of the short message sender in a current period of time, and if the timing feature of the short message sent by the short message sender is detected as the predetermined time series feature, And determining, by the sender of the short message, a spam sender; or detecting, by online, the short message of the short message sender in the current period of time, if the sender of the short message is detected to communicate with all recipients that send the short message If the ratio of the logarithm of the record to the total logarithm of the two-two combination is less than the predetermined value, determining that the short message sender is a spam sender; or detecting the short message of the short message sender within the current time period If the timing feature of the short message sender sending the short message is detected as the predetermined time-series feature, and if the sender of the short message is detected, the logarithm of the mutual communication record is combined with the logarithm of the mutual communication record If the ratio of the total logarithm is less than the predetermined value, it is determined that the short message sender is a spam sender. The online detection module is further configured to: before detecting whether the short message sender is a spam sender, detecting that the number of the short message sent by the short message sender in a unit time exceeds a threshold. The system further includes: a bill pre-processing module, configured to: extract the sender of the short message within a current period of time

The detecting module is further configured to: before detecting the short message sender as the spam sender according to the predetermined rule, detecting that the short message sender is not on the blacklist and the whitelist.

The traditional content-based garbage monitoring system is not ideal for spam filtering in both the precision and the recall rate, and the content of the short message needs to be scanned, and the system resource overhead is large. The method and system for spam monitoring provided by the present invention is based on the characteristics of the sender's behavior in time series and space for spam message monitoring, which has a high precision and recall rate, and also improves the spam maker's Avoiding costs, and not needing to scan SMS content, the system performance has also been greatly improved. 1 is a schematic diagram of a spam short message monitoring system of the present invention; FIG. 2 is a flowchart of a spam short message monitoring method according to the present invention; FIG. 3 is a schematic diagram of a spam short message monitoring system according to an embodiment of the present invention; FIG. 5 is a flowchart of a behavior of training a spam sender according to an embodiment of the present invention; FIG. 6 is a flowchart of online detection according to an embodiment of the present invention.

BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the features in the embodiments and the embodiments in the present application may be arbitrarily combined with each other. SMS senders have certain temporal characteristics and spatial characteristics in behavior. For example, many spammers use the method of group sending to send commercial advertisements. The frequency characteristics expressed in the transmission timing are obviously different from those of ordinary SMS senders. The frequency of machine group sending is often fixed. For example, the time interval for sending SMS messages is certain. The frequency of sending ordinary messages is not fixed, and the regularity is not strong. Similarly, in terms of spatial characteristics, the sender of normal short messages has a stable and unique social network characteristics, and the relationship is relatively hidden, and the social relationship network reflected by spammers is chaotic and unstable. Because everyone has their own fixed social circle, most of the objects that send text messages are mostly in the social circle, and each person's social circle is different, that is, the social network is different; and the spam messages are sent between the objects. Often there is no relationship. If spammers want to circumvent monitoring based on social network, they must acquire everyone's social network. Because everyone has their own unique social network, it is difficult for spammers to get everyone's. Social network. Simply put, the relationship is more subtle. That is to say, we usually don't know what other people's social networks are. It is more difficult for spammers to send spam messages to get a lot of people's social networks. The invention utilizes the behavior of the spam sender and the normal message sender in time characteristics and / or the difference in spatial characteristics for the monitoring of spam messages. By analyzing the temporal and spatial characteristics of the spammers, the time series features and social network characteristics are extracted, the time series features of the spam makers and the measurement model of the social relationship network are trained, and the model is used to measure the sender of the message. The probability of a garbage maker.

The process of training the timing characteristics of the spam maker and the measurement model of the social relationship network is actually, by obtaining a list of known spam makers, by analyzing the group of spam makers at the time and The features on the space extract features that are common in time series and on the social network, and are expressed in the form of parameter values as a reference for checking whether other SMS senders are spammers. The time series feature model is a set of frequency characteristic parameters for sending short messages sent from the historical short message record of the spam sender. For example, each message sent within a certain period of time has a certain time interval between sending messages. The rule, such as a spammer sending a text message every 1 second, then the characteristic is that the time interval is 1 second. Some low-frequency spam sending users may deliberately set a longer interval in order to avoid the monitoring transmission time. However, as long as it is sent through the machine group, there will always be a certain regularity in the sending time interval. The social relationship network feature (ie, the spatial feature model) can be reflected from the situation of short message communication records between the sender and the receiver in a certain period of time. The social relationship between spammers is relatively alienated, that is, there are fewer communication records between each other. It is possible to utilize the logarithm of mutual communication records between all SMS recipients (including SMS senders) (for example, a pair of two users who have a reply and a reply) and two or two SMS receivers (message senders) The ratio of the total logarithm of the combination to measure the closeness of the social relationship between the sender of the message and all recipients. The proportion of social relationships between senders and recipients of spam messages is generally small. The social network diagram including the sender of the short message and all the recipients of the short message can be constructed through the historical short message record, and each short message sender and all the short message receivers are regarded as one node respectively, and the nodes having the communication record are side by side. Connected, and then the node aggregation degree parameter calculated according to the figure can be measured by the ratio of the number of sides actually connected in the figure to the total number of sides connected to each node. The more the number of edges of the graph means the higher the degree of node aggregation, and the degree of node aggregation is usually lower in the social network diagram constructed by the spam message maker. Spammers have the distinction of high frequency sending users and low frequency sending users. High-frequency sending users are more harmful because they send a large amount of spam messages in a short period of time. Low-frequency sending users will not generate a large amount of spam messages in a short period of time, which will not cause harm in a short period of time. For both cases, the garbage monitoring system needs to detect the high frequency transmitting user in a short time and detect the low frequency transmitting user within a certain period of time. In order to meet this requirement, the present invention employs a combination of on-line detection and off-line detection. On-line detection for high-frequency transmission users, to examine the current period of time data, has a strong timeliness; offline detection to examine a certain period (such as data within 1 week), as a supplement to online detection, offline detection can detect online detection can not detect The low frequency spam message found was sent to the user. In order to realize spam message detection based on time series features and spatial features, it is necessary to perform offline training on the SMS message transmission records of the spam message makers in the historical bills in a certain period of time to obtain the time series characteristics of the spam message makers. And the social relationship network measurement model, the training process includes extracting the sender's time series characteristics and the social relationship network characteristics, performing cluster analysis, and statistically obtaining the rules of the spam sender, and finally generating a model file containing the spam message transmission regular parameters. In the process of spam detection, the timing characteristics of the sender and the social relationship network feature in the real-time short message are also extracted, and the similarity between the sample and the model file is calculated to determine whether the sender is a spam sender. The training process is adaptive, and the system periodically picks up the CDRs for training and adjusts the template library.

When the system performs spam detection, first, the black and white list is detected. If the sender of the message is on the black and white list, the user is directly skipped. Because the blacklist is a user who has been identified as a spammer or a specific user that is prohibited from sending SMS by the operator, it does not make sense to detect the blacklist again. The purpose of spam monitoring is to find the spammer. It is added to the blacklist, and since it is already on the blacklist, there is no need to check it. Similarly, a whitelist user is usually a non-monitoring user set by the operator. No matter what kind of short message the whitelist user sends, the spam SMS monitoring system cannot be handled as a spam message maker, so there is no meaning for whitelist monitoring. Then, detection based on time series features and/or spatial features can be performed, and online detection and offline detection can be performed in parallel; finally, the blacklists derived from several different detection methods can be combined and the blacklist can be synchronized to the service. Operation support system Operation Support System, BOSS )

For a better understanding of the invention, the invention will be further described in conjunction with the drawings and specific embodiments. FIG. 1 is a schematic diagram of a spam short message monitoring system according to the present invention. As shown in FIG. 1 , the spam short message monitoring system of the present invention mainly includes: a detecting module and a monitoring module, wherein the detecting module is configured to: if the short message sender is detected according to a predetermined rule For the sender of the spam message, the sender of the message is blacklisted, and then the blacklist is sent to the monitoring module. The monitoring module is configured to: monitor the spam message according to the blacklist, and the predetermined rule is at least The method includes: if detecting that the timing feature of the short message sent by the sender of the short message is a predetermined time series feature, for example, the time interval for sending the short message in a unit time is fixed, the short message sender is specified as a spam sender; or Detecting that the ratio of the logarithm of the mutual communication record and the total logarithm of the two-way combination between the sender of the short message and all the recipients that send the short message within a predetermined time period is less than a predetermined value, for example, less than 10%, the short message is sent Is specified as a spammer; or if the sender of the message is detected for a predetermined period of time The timing feature of sending the short message is a predetermined time-series feature, and detecting that the ratio of the logarithm of the mutual communication record and the total logarithm of the two-way combination between the sender of the short message and all the recipients that send the short message within the predetermined time period is less than a predetermined value, The sender of the short message is specified as a spammer. In this way, the spam short message monitoring system of the present invention can monitor the spam short message according to the timing characteristics and/or spatial characteristics of the spam sender, so as to improve the precision and recall rate of the spam message.

Further, the spam monitoring system of the present invention may further include: a training module, configured to: extract a historical short message record of a known spam sender, and send a known spam sender by training from the historical short message record. The frequency characteristics of the text message come Training the predetermined timing feature, and then transmitting the predetermined timing feature to the detecting module; or connecting the nodes with mutual communication records in the historical short message record to form the known spam sending The social relationship network diagram between all the recipients and the senders of the short message, the predetermined value is trained by the ratio of the number of sides to the total number of sides connected between all the nodes, and then the predetermined value is sent to The detection module. In this way, the spam short message monitoring system of the present invention can train different time series feature models and spatial feature models for different operators. 2 is a flowchart of a method for monitoring spam messages according to the present invention. As shown in FIG. 2, the method of the present invention includes the following steps: Step 10: detecting, according to a predetermined rule, whether a sender of a short message is a spam sender, and if so, executing Step 20: If not, repeat step 10; Step 20, blacklist the sender of the short message to monitor the spam message. The predetermined rule includes: if the timing feature of the short message sent by the short message sender in the predetermined time period is at a predetermined time series feature, for example, the time interval for sending the short message within a predetermined time period is fixed, the short message sender is specified as The spam sender; or if the ratio of the logarithm of the mutual communication record to the total number of pairs of the two-way combination between the sender of the message and all the recipients who sent the message within a predetermined time period is less than a predetermined value, the message is sent The sender is specified as a spammer; or if the SMS sender sends the SMS in a predetermined time period, the timing feature is at a predetermined timing feature, and if there is a mutual time between the sender of the SMS and all recipients of the SMS within the predetermined time period If the ratio of the logarithm of the communication record to the total logarithm of the two-two combination is less than a predetermined value, the short message sender is specified as the spam sender. In this way, the spam message monitoring method according to the present invention can implement the monitoring of spam messages based on the timing characteristics and/or spatial features of the spam sender, so as to improve the precision and recall rate of the spam messages. Preferably, before step 10, the following steps may also be included: Extracting a historical short message record of a known spam sender, training the predetermined time-series feature by training a frequency feature of the known spam sender from the historical short message record; and/or A social relationship network diagram between the nodes having mutual communication records in the record and the neighbors and the senders who send the short messages by the edges, and the number of sides and all the nodes are The ratio of the total number of sides connected is trained to the predetermined value.

The invention will now be described in detail by way of specific examples. FIG. 3 is a schematic diagram of a spam short message monitoring system according to an embodiment of the present invention. As shown in FIG. 3, the garbage monitoring system of the embodiment includes: a bill preprocessing module, a training module, a manual labeling module, a detecting module, and a black and white list management module. . The CDR pre-processing module is configured to: pre-process the SMS center CDRs, including: removing duplicate records, removing non-point-to-point short messages, removing non-target carrier CDRs, extracting useful fields, and converting the format to the system internal format and warehousing operating. Some records in the bill record are records that failed to be retried due to system reasons. Such records can only be processed as one SMS; some SMS records are sent to the user by the operator's customer service system, not sent by the user, and no monitoring is required. To remove; the operator only monitors the users belonging to the carrier. For the non-operator users to send text messages to the carrier users, the SMS center will also generate bill records, and such records do not need to be monitored; There are a lot of fields, but for spam monitoring, you only need to use a few of them, just need to extract useful fields. In addition, you need to convert the bill into a format that can be recognized inside the system. The CDR pre-processing module can obtain the original CDR of the SMS center through a File Transfer Protocol (FTP). The training module is configured to: train historical CDRs known as spam senders to generate model files for spam detection. The manual labeling module is set to: correctly mark the user category of the candidate user who may be the spam sender before training the spammer sender model, so that the model training is obtained. The model file more accurately conforms to the regular characteristics of spammers. The detecting module in this embodiment may include: the online timing detecting module is configured to: detect a timing feature of the short message sender online and derive a blacklist.

The online space detection module is configured to: detect the social network characteristics of the sender of the short message online and derive a blacklist. The offline space detection module is configured to: offline detect the social network characteristics of the sender of the message and derive a blacklist. The black and white list management module is configured to: after the blacklists of the above three detection modules are combined, the results are synchronized to the BOSS, and the black and white list is obtained from the BOSS and synchronized to the detection module. The black and white list can also be synchronized by FTP between the black and white list management module and the BOSS.

FIG. 4 is a flowchart of a method for monitoring spam messages according to an embodiment of the present invention. As shown in FIG. 4, the specific process includes the following steps: Step 201: Acquire an original bill of a short message center, and perform preprocessing. The pre-processing of the bill pre-processing module includes: removing duplicate records, removing non-point-to-point short messages, removing non-target operator bills, extracting useful fields, converting the format to the internal format of the system, and sorting according to the time sequence of submitting the short messages, wherein, extracting Useful fields include: message identification (id), sender number, recipient number, SMS submission time, SMS length, and SMS content. Then, the CDR pre-processing module sends the pre-processed SMS CDR to the detection module. Step 202: The detecting module scans the pre-processed bills one by one, and records only the submission time and the sender number and the receiver number. Step 203: The detection module performs blacklist-based filtering on each record. If the user is on the black and white list, the user is directly ignored. Step 204: Perform, according to the model file generated by the training module training, based on the short message sender timing feature and/or the spatial feature. In this embodiment, online detection may be performed, and offline detection may also be performed. The online detection may detect the timing characteristics of the short message sender, and may also detect the spatial characteristics of the short message sender. Offline detection generally detects the spatial characteristics of a short message sender over a historical period of time. The online timing detection module, online space detection module and offline space detection module can be operated in parallel or separately. The online time series detection module and the online space detection module analyze the characteristics of the scanned text sent by the user during the current period of time. The offline space detection generally analyzes the social relationship network characteristics of the user for a long period of time (for example, one week). Step 205: Blacklist the detected spammers. If the above three detection modules operate in parallel, the blacklist will be generated independently. The blacklist and whitelist management module will combine the blacklists exported by the three detection modules to obtain the final blacklist. The three detection modules can detect spam senders from three angles. From the detection results, most of the blacklists detected by the three methods are the same. The purpose of using these three methods in parallel is to complement each other. There may be a small number of spam senders, some methods can be monitored, and some can not be monitored. For example, low-frequency spam senders are more difficult to monitor through online detection methods, but can be monitored by offline methods. The parallel use of the other three methods also serves to improve the cost of spammers. Step 206: The black and white list management module synchronizes the blacklist to the BOSS.

BOSS will provide the blacklist to the control module of the SMS center. When sending the SMS, the SMS center will first check whether the sender is on the blacklist. If it is on the blacklist, the user is prohibited from sending the SMS.

FIG. 5 is a flowchart of a behavior of training a spam sender according to an embodiment of the present invention; as shown in FIG. 5, the specific process includes the following steps: Step 301: Extract a historical period of a period for pre-processing into a library. Step 302: Initially obtain a candidate training set that is considered to be a spam sender according to an existing empirical model.

The existing empirical model refers to a set of parameters obtained by analyzing the timing characteristics and spatial characteristic rules of spam makers in the operator's historical bill data. Step 303: The training set size is evaluated. If the training set is not large enough, indicating that the number of spam senders is not large, the model file trained by the training set has little statistical significance, and it is necessary to return to step 301 to obtain more CDRs. Train. If the training set is considered to be sufficient in size, proceed to step 304 for the next step. Step 304: Perform manual labeling on the training set, and use the annotation tool provided by the manual labeling module to view the short message sent by each user in the training set, and classify and label the training set user according to manual judgment. The manual classification labeling usually determines whether the user has sent spam messages according to the content of the sent short message by checking the content of the short message. Generally, the criterion for the spam message is combined with the requirements of the operator. Manual categorization usually divides users into four categories, namely normal SMS senders, spammers, mixed SMS senders, and other SMS senders. Among them, the mixed SMS sender sends both normal text messages and spam messages, and other SMS senders are usually garbled or blessing messages sent by the operator. Step 305: Extract a historical CDR of the spam sender according to the labeling result, and train the time series feature and the space feature. The time-series feature can be converted into frequency domain information, and the extracted spatial feature parameters can include: the number of sent short messages, the number of received short messages, the number of recipients replying to the short message, the number of pairs of recipients having mutual communication records, and the like, The spatial feature model can be trained by replying to the number of short messages, that is, the logarithm of the mutual communication record. Step 306: Determine a spam short message sender transmission rule by frequency domain analysis and social relationship network analysis, and generate a time series feature based model file and a spatial feature based model file respectively. Step 307: Synchronize the generated model file to the detection module. Model files can be flexibly adjusted according to different operators' requirements for precision and recall. For example, if the operator wants a higher recall rate, the users marked as mixed SMS senders will be classified as spam messages during training; if the operator wants a higher precision, the training will only be Users marked as spammers are trained.

FIG. 6 is a flowchart of online detection according to an embodiment of the present invention. As shown in FIG. 6, the specific process includes The following steps are as follows: Step 401: Scan the pre-processed bills one by one, and record only the submission time and the number of the sender and recipient of the short message. Step 402: Perform an online detection condition triggering judgment. If a certain trigger condition is met, the process proceeds to step 403 to start the online detection algorithm. Otherwise, return to step 401 to continue scanning the CDR. For example, if the number of short messages sent by the user in a unit time exceeds a certain threshold, the threshold can be adjusted according to the actual detection status, and an online detection related algorithm is started. Step 403: Extract timing characteristics and spatial features of the real-time short message sender. Step 404: After determining the timing feature and the spatial feature of the short message sender, compare with the trained model file to determine whether the sender is a spammer.

One of ordinary skill in the art will appreciate that all or a portion of the above steps may be accomplished by a program instructing the associated hardware, such as a read-only memory, a magnetic disk, or an optical disk. Alternatively, all or part of the steps of the above embodiments may also be implemented using one or more integrated circuits. Correspondingly, each module/unit in the above embodiment may be implemented in the form of hardware or in the form of a software function module. The invention is not limited to any specific form of combination of hardware and software.

The above is only a preferred embodiment of the present invention, and of course, the present invention may be embodied in various other embodiments without departing from the spirit and scope of the invention. Corresponding changes and modifications are intended to be included within the scope of the appended claims.

Industrial Applicability The method and system for spam monitoring provided by the present invention is based on the characteristics of the sender's behavior in time series and space for spam message monitoring, which has a high precision and recall rate, and also improves spam message manufacturing. The cost of avoiding, and the need to scan the text message content, the system performance has also been greatly improved.

Claims

Claim

A method for monitoring spam messages, the method comprising: if detecting a sender of a short message as a spam sender according to a predetermined rule, the sender of the message is blacklisted, and monitoring the spam message, the predetermined rule The method at least includes: if the time-sending feature of the short message sent by the sender of the short message in the predetermined time period is a predetermined time series feature, the short message sender is specified as a spam sender; or if the short message sender sends a short message within a predetermined time period If the ratio of the logarithm of the mutual communication record to the total logarithm of the two-to-two combination between all the receivers is less than the predetermined value, the short message sender is specified as the spam sender; or if the short message sender is within the predetermined time period The timing characteristic of sending the short message is at a predetermined time-series feature, and if the ratio of the logarithm of the mutual communication record to the total number of pairs of the two-way combination between the sender of the short message and all the recipients who send the short message within a predetermined time period is less than a predetermined value, The sender of the short message is specified as a spammer.

2. The method according to claim 1, wherein: before the step of detecting that the short message sender is a spam sender according to a predetermined rule, the method further comprises: extracting a historical short message record of the known spam sender, by The historical short message record trains the frequency feature of the known spam sender to send the short message to train the predetermined time series feature; and/or connects the nodes in the historical short message record with the mutual communication record Constructing a social relationship network diagram between the known spam sender and all recipients sending the short message, and training the predetermined value by the ratio of the number of sides to the total number of sides connected by all the nodes.

The method of claim 1, wherein: before the step of detecting that the short message sender is a spam sender according to the predetermined rule, the method further comprises: detecting that the short message sender sends the short message in a unit time The number of bars exceeds the threshold.

The method of claim 3, wherein: the step of detecting that the short message sender is a spam sender according to the predetermined rule comprises: detecting, by the online sender, the short message of the short message within the current period of time, if the detecting If the timing feature of the short message sent by the short message sender is the predetermined time series feature, the short message sender is determined to be a spam sender; or the short message of the short message sender in the current time period is detected online, if the short message is detected Determining that the short message sender is a spam sender between the sender of the short message and all the recipients of the short message with a ratio of the logarithm of the mutual communication record and the total logarithm of the two pairs being combined is less than the predetermined value. Or detecting the short message CDR of the short message sender in the current period of time, if detecting that the timing feature of the short message sender sending the short message is the predetermined time series feature, and if detecting the short message sender and sending the short message The ratio of the logarithm of the mutual communication record to the total logarithm of the two-two combination between all receivers is less than The predetermined value is determined to determine that the short message sender is a spam sender.

The method of claim 4, wherein: before the step of detecting that the short message sender is a spam sender according to the predetermined rule, the method further comprises: extracting the short message of the short message sender in the current period of time CDR; pre-processing the short message.

The method according to any one of claims 1 to 5, wherein: before the step of detecting that the short message sender is a spam sender according to a predetermined rule, the method further comprises: detecting that the short message sender is absent Blacklist and whitelist.

A system for monitoring spam messages, the system comprising: a detection module, configured to: if the sender of the message is detected as a spam sender according to a predetermined rule, the sender of the message is blacklisted, and then the The blacklist is sent to the monitoring module; and the monitoring module is configured to: monitor the spam message according to the blacklist, The predetermined rule at least includes: if detecting that the timing feature of the short message sender sending the short message within the predetermined time period is a predetermined time series feature, specifying the short message sender as a spam sender; or detecting the short message within the predetermined time period If the ratio of the logarithm of the mutual communication record to the total number of pairs of the two or two combinations of the sender and the sender of the short message is less than a predetermined value, the sender of the short message is specified as a spam sender; or if the message is sent The timing feature of the short message sent during the predetermined time period is at a predetermined time-series feature, and if the sender of the short message has a logarithm of the mutual communication record and the total number of pairs of the two-way combination between the sender and the sender of the short message within the predetermined time period If the ratio is less than a predetermined value, the sender of the short message is specified as a spammer.

8. The system of claim 7, further comprising: a training module configured to: extract a historical short message record of a known spam sender, and obtain a known spam sender by training from the historical short message record Transmitting a frequency characteristic of the short message to train the predetermined timing feature, and then transmitting the predetermined timing feature to the detecting module; and/or, between the nodes having the mutual communication record in the historical short message record Constructing a social relationship network diagram between the known spam sender and all recipients sending the short message, and training the predetermined value by the ratio of the number of sides to the total number of sides connected by all the nodes And then transmitting the predetermined value to the detection module.

The system of claim 7, wherein the detecting module comprises: an online detecting module, configured to: detect, on the online, the short message of the short message sender in the current time period, if the short message is sent If the timing feature of the short message sent is the predetermined time series feature, the sender of the short message is determined to be a spam sender; or the short message of the short message sender in the current time period is detected online, and if the short message is sent, If the ratio of the logarithm of the mutual communication record to the total logarithm of the two-to-two combination between the recipients and the other recipients of the short message is less than the predetermined value, the short message sender is determined to be the spam sender; or the online detection office a short message CDR of the short message sender in the current period of time, if the timing feature of the short message sent by the short message sender is detected as the predetermined time series feature, and if all the receiving of the short message sender and the short message is detected The ratio between the logarithm of the mutual communication record and the total logarithm of the two-two combination is less than the predetermined value, and the short message sender is determined to be the spam sender.

10. The system according to claim 9, wherein: the online detecting module is further configured to: detect, before detecting whether the short message sender is a spam sender, the number of the short message sent by the short message sender in a unit time More than the threshold.

11. The system of claim 9, further comprising: a bill pre-processing module configured to: extract the sender of the short message for a current period of time

The system according to any one of claims 7 to 11, wherein the detecting module is further configured to: detect that the short message sender is not blacklisted before detecting that the short message sender is a spam sender according to a predetermined rule And on the white list.