CN104284306B - A kind of method for filtering spam short messages, system, mobile terminal and Cloud Server - Google Patents

A kind of method for filtering spam short messages, system, mobile terminal and Cloud Server Download PDF

Info

Publication number
CN104284306B
CN104284306B CN201310279728.8A CN201310279728A CN104284306B CN 104284306 B CN104284306 B CN 104284306B CN 201310279728 A CN201310279728 A CN 201310279728A CN 104284306 B CN104284306 B CN 104284306B
Authority
CN
China
Prior art keywords
short message
mobile terminal
training set
owned
pending
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310279728.8A
Other languages
Chinese (zh)
Other versions
CN104284306A (en
Inventor
何通庆
郭伟
方礼勇
杜国楹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Eren Eben Information Technology Co Ltd
Original Assignee
Beijing Eren Eben Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Eren Eben Information Technology Co Ltd filed Critical Beijing Eren Eben Information Technology Co Ltd
Priority to CN201310279728.8A priority Critical patent/CN104284306B/en
Publication of CN104284306A publication Critical patent/CN104284306A/en
Application granted granted Critical
Publication of CN104284306B publication Critical patent/CN104284306B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/12Messaging; Mailboxes; Announcements
    • H04W4/14Short messaging services, e.g. short message services [SMS] or unstructured supplementary service data [USSD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/12Detection or prevention of fraud

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Transfer Between Computers (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

Embodiment of the present invention discloses a kind of method for filtering spam short messages, including:Mobile terminal classifies to obtain classification results to pending short message according to the classified lexicon of its storage, when classification results are judged as the uploading instructions for the classification results that the classification results of mistake and mobile terminal receive corresponding mistake, classification error information is uploaded to Cloud Server to update privately owned short message training set corresponding with mobile terminal by mobile terminal, and mobile terminal obtains the classified lexicon that the Word library updating information of Cloud Server is stored with synchronized update mobile terminal.Embodiment of the present invention also discloses a kind of mobile terminal, Cloud Server, filtering junk short messages system.By the above-mentioned means, the present invention can improve filter efficiency of the mobile terminal to refuse messages, make the filtering of refuse messages that there is personalization.

Description

A kind of method for filtering spam short messages, system, mobile terminal and Cloud Server
Technical field
The present invention relates to text classification field, more particularly to a kind of method for filtering spam short messages, system, mobile terminal with And Cloud Server.
Background technology
With the rapid development of mobile communication technology with the rapid promotion of mobile phone popularity rate, short message just with its it is short and small, rapid, Many advantages, such as easy, cheap, has become a kind of important communication and exchange way of people, is brought greatly to user Exchange it is convenient, meanwhile, refuse messages are becoming increasingly rampant, and especially quickly popularize in smart mobile phone, personal information security problem day Beneficial severe today, many users are deep to be perplexed by refuse messages.Refuse messages refer to that user did not customize, and include The contents such as advertisement, deception, pornographic and same content is continuously transmitted in the short time, influence user's normal use, work and life Short message, common refuse messages content includes advertising information, pornography, false prize information, fraud information, mischief etc., I.e. to the nugatory information of user, many worries are brought to user, therefore are badly in need of being monitored filtering to refuse messages. The main filter method for including two kinds of refuse messages in the prior art:A kind of method is in short message service center(SMSC)Etc. short messages Processing center is handled;Another method is then to execute entire rubbish with the embedded program of establishment on the mobile terminals such as mobile phone The filter process of short message.
Present inventor has found that some information are such as lottery information, ticket information, advertising information in long-term R & D It may be refuse messages for a part of user, but refuse messages be then not belonging to for another part user, short Telecommunications services center, which is filtered, may cause the information by mistake classification that can not reach on the mobile terminal of user, refuse messages Filtering lacks the demand difference for considering different user;Additionally due to the calculating speed of mobile terminal and space be all than relatively limited, The filter process of entire refuse messages is executed on mobile terminal can consume the excessive time and space, influence user to short message just Often receive.
Invention content
The invention mainly solves the technical problem of providing a kind of method for filtering spam short messages, system, mobile terminal and Cloud Server can improve filter efficiency of the mobile terminal to refuse messages, and the filtering of refuse messages is made to have personalization.
In order to solve the above technical problems, the first aspect of the present invention is:A kind of method for filtering spam short messages is provided, including: Mobile terminal classifies to obtain classification results to pending short message according to the classified lexicon of its storage, wherein classification results For refuse messages or non-junk short message;When classification results are judged as the classification results of mistake and mobile terminal receives corresponding mistake When the uploading instructions of classification results accidentally, classification error information is uploaded to Cloud Server to update and mobile terminal by mobile terminal Corresponding privately owned short message training set, wherein classification error information includes the classification results of pending short message and mistake;It is mobile whole End obtains the classified lexicon that the Word library updating information of Cloud Server is stored with synchronized update mobile terminal, wherein Word library updating is believed Breath is cloud after the privately owned short message training set corresponding with mobile terminal stored in Cloud Server and/or the update of publicly-owned short message training set Server learns privately owned short message training set and publicly-owned short message training set and is obtained.
Wherein, mobile terminal classifies to obtain classification results to pending short message according to the classified lexicon of its storage Step specifically includes:Mobile terminal pending short message is pre-processed with obtain the corresponding word feature of pending short message and Rule feature;Mobile terminal is by the ratio P (C shared by the refuse messages stored in classified lexicon1), the ratio shared by non-junk short message Example P (C2), the matching probability P (x of word feature and rule feature in refuse messagesk|C1) and non-junk short message in With probability P (xk|C2) substitute into Bayes's classification formula, to obtain the probability P (C that pending short message belongs to refuse messages1| X), Shown in Bayes's classification formula is specific as follows:
Mobile terminal obtains the probability P (C that pending short message belongs to non-junk short message2| X), it is specific as follows shown:
P(C2|X)=1-P(C1|X)
Mobile terminal obtains the classification results of pending short message, wherein as P (C1|X)>P(C2| X) when then pending short message Belong to refuse messages, otherwise pending short message belongs to non-junk short message.
Wherein, pending short message is pre-processed in mobile terminal with obtain the corresponding word feature of pending short message with And further include before the step of rule feature:Mobile terminal judge pending short message sender's number whether with mobile terminal In corresponding privately owned black and white lists, wherein then wait locating when sender's number is in privately owned blacklist corresponding with mobile terminal The letter that is in the wrong belongs to refuse messages, the then pending short message category when sender's number is in privately owned white list corresponding with mobile terminal In non-junk short message;When sender's number is not in privately owned black and white lists corresponding with mobile terminal, mobile terminal continues to sentence Whether disconnected sender's number is in publicly-owned black and white lists, wherein then pending short when sender's number is in publicly-owned blacklist Letter belongs to refuse messages, and when sender's number is in publicly-owned white list, then pending short message belongs to non-junk short message;Work as transmission When person's number is not in publicly-owned black and white lists, mobile terminal execution pre-processes pending short message to obtain pending short message The step of corresponding word feature and rule feature.
Wherein, when classification results are judged as the classification results of mistake and mobile terminal receives the classification knot of corresponding mistake When the uploading instructions of fruit, the classification error information that mobile terminal uploads to Cloud Server further includes the sender number of pending short message Sender's number is uploaded to Cloud Server to judge whether sender's number Cloud Server storage is added by code, mobile terminal In privately owned black and white lists corresponding with mobile terminal and/or publicly-owned black and white lists;When Cloud Server storage and mobile terminal pair When the privately owned black and white lists answered and/or publicly-owned black and white lists update, mobile terminal obtains the privately owned black and white lists of Cloud Server more Publicly-owned black and white lists that new information and/or publicly-owned black and white lists fresh information are stored with synchronized update mobile terminal and/or privately owned Black and white lists.
Wherein, the classification results of mistake be by belong to refuse messages it is pending it is SMS classified be non-junk short message or will Belong to the pending SMS classified for refuse messages of non-junk short message;Word library updating information includes at least privately owned short message training set more Matching probability, rubbish of the word feature and rule feature of pending short message in refuse messages or non-junk short message are short after new Ratio shared by the shared ratio of letter and non-junk short message.
In order to solve the above technical problems, the second aspect of the present invention is:A kind of method for filtering spam short messages is provided, including: The privately owned short message training set corresponding with mobile terminal and publicly-owned short message training set that Cloud Server stores it are learnt to obtain Classified lexicon corresponding with mobile terminal is obtained, classified lexicon classifies to be classified to pending short message for mobile terminal As a result, wherein classification results are refuse messages or non-junk short message;When classification results are judged as the classification results of mistake and shifting When dynamic terminal receives the uploading instructions of the classification results of corresponding mistake, the classification error of cloud server mobile terminal upload Information, wherein classification error information includes the classification results of pending short message and mistake;Cloud Server adds pending short message Enter in privately owned short message training set corresponding with mobile terminal to update privately owned short message training set;In privately owned short message training set and/or After publicly-owned short message training set update, Cloud Server learns to obtain word privately owned short message training set and publicly-owned short message training set Library fresh information.
Wherein, the classification results of mistake be by belong to refuse messages it is pending it is SMS classified be non-junk short message or will Belong to the pending SMS classified for refuse messages of non-junk short message;When the classification results of mistake are that will belong to waiting for for refuse messages Handle it is SMS classified for non-junk short message when, after the update of privately owned short message training set, Cloud Server to privately owned short message training set and Publicly-owned short message training set is learnt to specifically include the step of Word library updating information to obtain:Cloud Server to pending short message into Row pretreatment is to obtain the corresponding word feature of pending short message and rule feature;Cloud Server is according to publicly-owned short message training set Word feature and rule feature are in rubbish in the matching frequency in refuse messages of middle word feature and rule feature, privately owned training set The matching frequency, privately owned short message training set in rubbish short message and refuse messages quantity, the non-junk short message in publicly-owned short message training set Quantity obtains the first Word library updating information, wherein the first Word library updating information includes pending after privately owned short message training set updates The matching probability of the word feature of short message and rule feature in refuse messages, the ratio shared by refuse messages and non-junk Ratio shared by short message;When mistake classification results be by belong to non-junk short message it is pending it is SMS classified be refuse messages When, privately owned short message training set update after, Cloud Server to privately owned short message training set and publicly-owned short message training set learnt with The step of obtaining Word library updating information specifically includes:Cloud Server pre-processes pending short message to obtain pending short message Corresponding word feature and rule feature;Cloud Server is according to word feature and rule feature in publicly-owned short message training set non- The matching frequency in non-junk short message of word feature and rule feature in the matching frequency, privately owned training set in refuse messages, Privately owned short message training set and refuse messages quantity, non-junk short message quantity in publicly-owned short message training set obtain the second Word library updating Information, wherein the second Word library updating information includes the word feature of pending short message and rule after privately owned short message training set updates Then matching probability of the feature in non-junk short message, the ratio shared by refuse messages and the ratio shared by non-junk short message.
Wherein, classification error information further includes sender's number of pending short message, and Cloud Server judges whether to send Person's number is added in privately owned black and white lists corresponding with mobile terminal and/or the publicly-owned black and white lists of Cloud Server storage, if Then Cloud Server updates privately owned black and white lists corresponding with mobile terminal and/or publicly-owned black and white lists to obtain privately owned black and white lists Fresh information and/or publicly-owned black and white lists fresh information, so that the publicly-owned black and white of mobile terminal synchronization update mobile terminal storage List and/or privately owned black and white lists.
In order to solve the above technical problems, the third aspect of the present invention is:A kind of mobile terminal is provided, including:Sort module, Classified lexicon for being stored according to mobile terminal classifies to obtain classification results to pending short message, wherein classification knot Fruit is refuse messages or non-junk short message, and classified lexicon is the privately owned short message corresponding with mobile terminal that Cloud Server stores it Training set and publicly-owned short message training set are learnt and are obtained;Uploading module, point for being judged as mistake when classification results When class result and mobile terminal receive the uploading instructions of the classification results of corresponding mistake, classification error information is uploaded into cloud clothes Be engaged in device to update privately owned short message training set corresponding with mobile terminal, wherein classification error information include pending short message and The classification results of mistake;Mobile terminal to update module, the Word library updating information for obtaining Cloud Server are moved with synchronized update The classified lexicon stored in terminal, wherein Word library updating information be privately owned short message training set and/or publicly-owned short message training set more Cloud Server learns privately owned short message training set and publicly-owned short message training set and is obtained after new.
In order to solve the above technical problems, the fourth aspect of the present invention is:A kind of Cloud Server is provided, including:Study module, For store privately owned short message training set corresponding with mobile terminal to Cloud Server and publicly-owned short message training set learnt with Classified lexicon corresponding with mobile terminal is obtained, classified lexicon classifies to be divided to pending short message for mobile terminal Class result, wherein classification results are refuse messages or non-junk short message;Cloud Server update module, when classification results are determined It is mobile whole for receiving when receiving the uploading instructions of corresponding wrong classification results for the classification results and mobile terminal of mistake Hold the classification error information uploaded, wherein classification error information includes the classification results of pending short message and mistake;Cloud service Device update module is additionally operable to pending short message being added privately owned short to update in privately owned short message training set corresponding with mobile terminal Believe training set;Study module is additionally operable to after privately owned short message training set and/or the update of publicly-owned short message training set, is instructed to privately owned short message Practice collection and publicly-owned short message training set is learnt to obtain Word library updating information, and then makes mobile terminal according to Word library updating information The classified lexicon stored in synchronized update mobile terminal.
In order to solve the above technical problems, the fifth aspect of the present invention is:A kind of filtering junk short messages system is provided, including such as The preceding mobile terminal and foregoing Cloud Server.
The beneficial effects of the invention are as follows:The case where being different from the prior art, the present invention are stored by mobile terminal according to it Classified lexicon classify pending short message to obtain classification results, when classification results are judged as the classification results of mistake And mobile terminal, when receiving the uploading instructions of the classification results of corresponding mistake, classification error information is uploaded to cloud by mobile terminal For server to update privately owned short message training set corresponding with mobile terminal, mobile terminal obtains the Word library updating information of Cloud Server With the classified lexicon that synchronized update mobile terminal stores, by the powerful processing capacity of Cloud Server, to updated privately owned short Letter training set and publicly-owned short message training set are learnt again, and the classificating word for having both personalized and universal general character is provided for mobile terminal Library, and then accuracy of the mobile terminal to filtering junk short messages is continuously improved, it improves mobile terminal and the filtering of refuse messages is imitated Rate makes the filtering of refuse messages have personalization.
Description of the drawings
Fig. 1 is the flow chart of method for filtering spam short messages first embodiment of the present invention;
Fig. 2 is the classified lexicon that mobile terminal is stored according to it in method for filtering spam short messages first embodiment of the present invention Classify to pending short message to obtain the flow chart of classification results;
Fig. 3 is the flow chart of method for filtering spam short messages second embodiment of the present invention;
Fig. 4 is that the classification results when mistake are that will belong to rubbish in method for filtering spam short messages second embodiment of the present invention Pending SMS classified Cloud Server when being non-junk short message of short message to privately owned short message training set and publicly-owned short message training set into Row study is to obtain the flow chart of Word library updating information;
Fig. 5 is that the classification results when mistake are that will belong to non-rubbish in method for filtering spam short messages second embodiment of the present invention Pending SMS classified Cloud Server when being refuse messages of rubbish short message to privately owned short message training set and publicly-owned short message training set into Row study is to obtain the flow chart of Word library updating information;
Fig. 6 is the functional block diagram of one embodiment of mobile terminal of the present invention;
Fig. 7 is the functional block diagram of one embodiment of Cloud Server of the present invention;
Fig. 8 is the functional block diagram of one embodiment of filtering junk short messages system of the present invention.
Specific implementation mode
Below in conjunction with the attached drawing in embodiment of the present invention, the technical solution in embodiment of the present invention is carried out clear Chu is fully described by, it is clear that described embodiment is only some embodiments of the invention, rather than whole realities Apply mode.Based on the embodiment in the present invention, those of ordinary skill in the art institute without making creative work The every other embodiment obtained, belongs to the scope of protection of the invention.
Referring to Fig. 1, method for filtering spam short messages first embodiment of the present invention includes:
Step S101:Classify pending short message to obtain classification results;
Mobile terminal classifies to pending short message according to the classified lexicon of its storage to be corresponded to obtaining pending short message Classification results, wherein classification results be refuse messages or non-junk short message.The classified lexicon of mobile terminal storage and cloud service The classified lexicon of device storage keeps synchronized update at any time, the classified lexicon of Cloud Server storage be Cloud Server it is stored with The corresponding privately owned short message training set of mobile terminal and publicly-owned short message training set are learnt and are obtained.Private corresponding with mobile terminal It can be classified refuse messages and/or non-junk short message that are empty or being stored with mobile terminal upload to have short message training set, when When privately owned short message training set corresponding with mobile terminal is empty, classified lexicon is Cloud Server to publicly-owned short message training set and is Empty privately owned short message training set is learnt and is obtained, i.e., only learns at this time to publicly-owned short message training set;When whole with movement When to hold corresponding privately owned short message training set not be empty, classified lexicon is that Cloud Server pair privately owned short message corresponding with mobile terminal is instructed Practice collection and publicly-owned short message training set is learnt and obtained.There are one publicly-owned short message training sets and multiple and shifting for Cloud Server storage There are one shared short message training sets and multiple privately owned short message instructions for the corresponding privately owned short message training set of dynamic terminal, i.e. Cloud Server storage Practice collection, wherein each privately owned short message training set corresponds to a mobile terminal.
Wherein, a certain number of classified refuse messages and non-junk short message, cloud are stored in publicly-owned short message training set All mobile terminals on server share a publicly-owned short message training set;And privately owned short message training set is stored on mobile terminal The classified refuse messages passed and non-junk short message, different mobile terminal correspond to different privately owned short message training sets.
Step S102:Classification error information is uploaded into Cloud Server and is instructed with updating privately owned short message corresponding with mobile terminal Practice collection;
After mobile terminal obtains the classification results of pending short message, user judges that the classification results that mobile terminal obtains are It is no for mistake classification results, wherein the classification results of mistake be by belong to refuse messages it is pending it is SMS classified be non-rubbish Rubbish short message will belong to the pending SMS classified for refuse messages of non-junk short message.Certain short messages are for some users It may be refuse messages, but may be then non-junk short message for other users, therefore different user is for same The correctness of the classification results of pending short message may have different judging results.
When user judges that classification results receive for the classification results and mobile terminal of mistake the classification of corresponding above-mentioned mistake When uploading instructions as a result, mobile terminal according to uploading instructions classification error information is uploaded into Cloud Server with update with it is mobile Classification error information is uploaded to and is taken after receiving uploading instructions by the corresponding privately owned short message training set of terminal, i.e. mobile terminal Business device, so that Cloud Server pair privately owned short message training set corresponding with the mobile terminal is updated, wherein classification error information Classification results including pending short message and corresponding mistake.
Step S103:Obtain the classified lexicon that the Word library updating information of Cloud Server is stored with synchronized update mobile terminal.
Mobile terminal obtains the classified lexicon that the Word library updating information of Cloud Server is stored with synchronized update mobile terminal, In, Word library updating information is the privately owned short message training set corresponding with mobile terminal stored in Cloud Server and/or publicly-owned short message Cloud Server learns privately owned short message training set and publicly-owned short message training set and is obtained after training set update, i.e. Word library updating Information is namely as follows when occurring to be obtained when at least one of privately owned short message training set and publicly-owned short message training set update Word library updating information is obtained when a kind of update of situation in three kinds of situations:(1)Publicly-owned short message training set update,(2)It is privately owned Short message training set updates,(3)Privately owned short message training set and publicly-owned short message training set update simultaneously.Cloud Server can periodically be added one It is trained in the classified refuse messages and/or non-junk short message to publicly-owned short message training set of fixed number amount with updating publicly-owned short message Collection.Wherein, when privately owned short message training set is not have the classified of memory mobile terminal upload in empty i.e. privately owned short message training set When short message and the privately owned short message training set of no update, Word library updating information is Cloud Server to updated publicly-owned short message training set And learnt to obtain i.e. at this time only to updated publicly-owned short message training set for empty privately owned short message training set It practises;When privately owned short message training set is not empty, Word library updating information is in privately owned short message training set and/or publicly-owned short message training set Cloud Server learns privately owned short message training set and publicly-owned short message training set and is obtained after update.When Cloud Server passes through It practises after obtaining Word library updating information, mobile terminal downloads Word library updating letter by modes such as GPRS, WiFi from Cloud Server Breath, the Word library updating information that mobile terminal only needs download information capacity smaller are updated entire in Cloud Server without downloading The update of the classified lexicon stored to mobile terminal can be realized in classified lexicon, reduces needed for mobile terminal to update classified lexicon Flow.Mobile terminal classifies to follow-up pending short message according to updated classified lexicon, is circulated throughout to form one Journey.
The classification error information update that the present invention is uploaded by the powerful processing capacity of Cloud Server, according to mobile terminal with The corresponding privately owned short message training set of mobile terminal, privately owned short message training set and/or publicly-owned short message training set update after, in conjunction with point Word dictionary and deactivated dictionary are learnt again, and general character and personalized classification are had both by further learning to provide for mobile terminal Dictionary, and then processing speed and accuracy of the mobile terminal to filtering junk short messages is continuously improved, mobile terminal is improved to rubbish The filter efficiency of short message, while personalized filtering junk short messages being also provided for mobile terminal, so that the filtering of refuse messages is had Personalization meets different filtration needs of the different user to short message.
Referring to Fig. 2, mobile terminal dividing according to its storage in method for filtering spam short messages first embodiment of the present invention Class dictionary classifies to pending short message specifically includes following sub-step to obtain classification results:
Sub-step S1011:Pending short message is pre-processed with obtain the corresponding word feature of pending short message and Rule feature;
Mobile terminal pre-processes pending short message to obtain the corresponding word feature of pending short message and rule Feature specifically includes:
Mobile terminal segments pending short message, and the participle dictionary by inquiring its storage divides pending short message At word feature significant one by one, wherein Chinese word segmentation is that Chinese short message text segmentation is minimum at Chinese, energy is independent Movable, significant language element, that is, entry;It, will according to such as space of the separation mark between word for English short message text English short message text is separated into word feature one by one.The segmenting method of present embodiment is Word Intelligent Segmentation method, that is, utilizes hidden horse Er Kefu models (Hidden Markov Model, HMM) algorithm.In other embodiments, can also be used Dictionary based segment method, The methods of cutting labelling method, the participle method based on statistics, rule-based participle method are segmented, and are not made too many restrictions herein.
Mobile terminal does not have contributive word feature according to the deactivated dictionary deletion of its storage to SMS classified comprising point Single word, interjection, auxiliary words of mood, pronoun for being formed after word etc..
After deletion does not have contributive word feature, mobile terminal is further chosen from remaining word feature to short message Higher word feature is contributed in classification, mutual information whether appearance by calculating each remaining word feature A with some C that classifies MI(A;C), wherein classification C includes refuse messages C1 and two classes of non-junk short message C2, mutual information MI (A;C calculation formula tool) Body is as follows:
Further therefrom choosing has highest mutual information MI (A;C word feature) judges that word used is special as classification Sign.
Mobile terminal obtains the rule feature of pending short message, and whether rule feature includes short message length, includes URL, is No includes telephone number and whether short message sending person number is phone number.
Pending short message X is expressed as:X={x1,x2,…,xn, xk(k=1,2 ..., n) it is the corresponding word of pending short message Feature and rule feature.
Sub-step S1012:By ratio, word feature and the rule feature shared by refuse messages, non-junk short message in rubbish Matching probability in matching probability and non-junk short message in rubbish short message substitutes into Bayes's classification formula;
Mobile terminal is by the ratio P (C shared by the refuse messages stored in classified lexicon1), the ratio shared by non-junk short message Example P (C2), the corresponding word feature of pending short message and rule feature xkMatching probability P (x in refuse messagesk|C1) with And the matching probability P (x in non-junk short messagek|C2) substitute into Bayes's classification formula, belong to rubbish to obtain pending short message Probability P (the C of short message1| X), Bayes's classification formula is specific as follows shown:
Wherein, the ratio P (C shared by refuse messages1) privately owned short message training set i.e. corresponding with mobile terminal and publicly-owned short Refuse messages quantity accounts for all short messages in letter training set(That is refuse messages and non-junk short message)The ratio of quantity;Non-junk short message Shared ratio P (C2) non-junk short message number in privately owned short message training set i.e. corresponding with mobile terminal and publicly-owned short message training set Amount accounts for the ratio of all short message quantity.The ratio P shared by refuse messages is stored in classified lexicon corresponding with mobile terminal (C1), the ratio P (C shared by non-junk short message2), the matching probability P (x of word feature and rule feature in refuse messagesk| C1) and non-junk short message in matching probability P (xk|C2), different mobile terminal corresponds to different classified lexicons.
Sub-step S1013:Obtain the probability that pending short message belongs to non-junk short message;
Mobile terminal further obtains the probability P (C that pending short message belongs to non-junk short message2| X), it is specific as follows shown:
P(C2|X)=1-P(C1|X)
In other embodiments, it Bayes's classification formula can also be used obtains pending short message and belong to non-junk short message Probability does not make too many restrictions herein.
Sub-step S1014:Obtain the classification results of pending short message.
Mobile terminal belongs to the probability P (C of refuse messages according to pending short message1| X) and belong to the general of non-junk short message Rate P (C2| X) obtain the classification results of pending short message, wherein as P (C1|X)>P(C2| X) when then pending short message classification knot Fruit is to belong to refuse messages, and otherwise the classification results of pending short message are to belong to non-junk short message.Meanwhile it also can be by judging P (C1| X) whether more than 0.5 classification judgement is carried out, as P (C1| X) be more than 0.5 when then belong to refuse messages, otherwise belong to non-rubbish Rubbish short message.
When classification results are judged as the classification results of mistake and mobile terminal receives the classification results of corresponding mistake When uploading instructions, then the privately owned short message training set corresponding with mobile terminal of at least Cloud Server storage is updated, and correspondence obtains The Word library updating information obtained is including at least the corresponding word feature of pending short message and rule after the update of privately owned short message training set Matching probability of the feature in refuse messages or non-junk short message, the ratio P (C shared by refuse messages1) and non-junk short message Shared ratio P (C2), specifically, when mistake classification results be by belong to refuse messages it is pending it is SMS classified be non-rubbish The matching probability of update word feature and rule feature in refuse messages, the ratio shared by refuse messages are corresponded to when rubbish short message P(C1) and non-junk short message shared by ratio P (C2);When the classification results of mistake are that will belong to the pending of non-junk short message It is SMS classified to correspond to matching probability, rubbish in non-junk short message of update word feature and rule feature when being refuse messages Ratio P (C shared by short message1) and non-junk short message shared by ratio P (C2)。
In addition, pending short message is pre-processed in mobile terminal with obtain the corresponding word feature of pending short message with And further include before the step of rule feature:
Whether mobile terminal judges sender's number of pending short message in privately owned black and white lists corresponding with mobile terminal In, wherein when sender's number is in privately owned blacklist corresponding with mobile terminal, then pending short message belongs to refuse messages, When sender's number is in privately owned white list corresponding with mobile terminal, then pending short message belongs to non-junk short message.
When sender's number is not in privately owned black and white lists corresponding with mobile terminal, mobile terminal continues to judge to send Whether person's number is in publicly-owned black and white lists, wherein when sender's number is in publicly-owned blacklist, then pending short message belongs to Refuse messages, when sender's number is in publicly-owned white list, then pending short message belongs to non-junk short message.
When sender's number is not in publicly-owned black and white lists, mobile terminal execution is above-mentioned to locate pending short message in advance The i.e. sub-step S1011 of the step of reason is to obtain the corresponding word feature of pending short message and rule feature.
After being performed in mobile terminal the classification results that above-mentioned steps S101 obtains pending short message, when classification results are determined When receiving the uploading instructions of corresponding wrong classification results for the classification results and mobile terminal of mistake, mobile terminal uploads to The classification error information of Cloud Server further includes sender's number of pending short message, and mobile terminal uploads to sender's number Cloud Server is to judge whether the privately owned black and white lists corresponding with mobile terminal that sender's number is added to Cloud Server storage And/or in publicly-owned black and white lists, if the then privately owned black and white corresponding with mobile terminal of Cloud Server update Cloud Server storage List and/or publicly-owned black and white lists.Specifically, after the sender's number for the pending short message that mobile terminal uploads, Cloud Server Sender's number is added in privately owned black and white lists corresponding with mobile terminal first, and reaches a fixed number in sender's number Publicly-owned black and white lists are then added after amount.For example, should if more than a preset quantity such as 10,000 one sender's numbers of user's report Sender's number is added in publicly-owned blacklist;When simultaneously more than another preset quantity such as 100 one sender's numbers of user's report Apparent then sender's number being added in publicly-owned blacklist containing illicit content of the short message content.
When the privately owned black and white lists corresponding with mobile terminal of Cloud Server storage and/or publicly-owned black and white lists update, Mobile terminal obtains the privately owned black and white lists fresh information of Cloud Server and/or publicly-owned black and white name by modes such as GPRS, WiFi The publicly-owned black and white lists and/or privately owned black and white lists that single fresh information is stored with synchronized update mobile terminal.Further, mobile Terminal judges follow-up pending short message using updated publicly-owned black and white lists and/or privately owned black and white lists.For example, It is correct judge to obtain pending short message belong to refuse messages or by belong to refuse messages it is pending it is SMS classified as non-junk it is short After letter, the corresponding sender's number of pending short message is uploaded in Cloud Server, Cloud Server is further by the sender number Code is added in privately owned black and white lists corresponding with mobile terminal.
It is appreciated that method for filtering spam short messages first embodiment of the present invention by mobile terminal according to Cloud Server pair The classified lexicon that privately owned short message training set corresponding with mobile terminal and publicly-owned short message training set are learnt and obtained is divided Class, when classification results mistake, mobile terminal uploads classification error information and moves classificating word corresponding with dynamic terminal to timely update Library, the study that mobile terminal is not necessarily to carry out short message sample can classify, and then can improve mobile terminal to refuse messages Filter efficiency, and different mobile terminal corresponds to different privately owned short message training set and classified lexicon, makes the mistake of refuse messages Filter has personalization, and improves the filtering accuracy of refuse messages.
In addition, the present invention had both obtained the word feature of short message using participle dictionary and deactivated dictionary, also obtain short message length, Whether comprising URL, whether comprising telephone number and short message sending person number whether be phone number etc. rule feature, pass through Word feature and the matching probability of rule feature are substituted into Bayes's classification formula, more accurately directly calculate pending short message Belong to the probability of refuse messages, and judge rapidly, calculating is simple and quick efficient, greatly reduces the processing of mobile terminal Workload.
Referring to Fig. 3, method for filtering spam short messages second embodiment of the present invention includes:
Step S201:Privately owned short message training set and publicly-owned short message training set are learnt;
The privately owned short message training set corresponding with mobile terminal and publicly-owned short message training set that Cloud Server stores it carry out Study is to obtain classified lexicon corresponding with mobile terminal.Privately owned short message training set corresponding with mobile terminal can be empty or deposit The classified refuse messages and/or non-junk short message for containing mobile terminal upload, when privately owned short message corresponding with mobile terminal When training set is empty, Cloud Server is learnt to publicly-owned short message training set and for empty privately owned short message training set to be divided I.e. Cloud Server only learns publicly-owned short message training set class dictionary at this time;When privately owned short message corresponding with mobile terminal is trained When collection is not empty, Cloud Server pair privately owned short message training set corresponding with mobile terminal and publicly-owned short message training set learnt with Obtain classified lexicon.Cloud Server pair privately owned short message training set corresponding with mobile terminal and publicly-owned short message training set learn It specifically includes:Cloud Server is trained according to the participle dictionary of its storage, deactivated dictionary pair privately owned short message corresponding with mobile terminal Collection and publicly-owned short message training set are pre-processed to obtain each refuse messages in publicly-owned short message training set, privately owned short message training set And the corresponding word feature of non-junk short message and rule feature, further according to refuse messages quantity and non-junk short message Quantity obtains matching probability in refuse messages, non-junk short message of each word feature and rule feature, refuse messages institute The ratio shared by ratio and non-junk short message accounted for.
When the participle dictionary of Cloud Server storage and/or when deactivated Word library updating, the participle dictionary of mobile terminal storage and/ Or deactivated dictionary keeps synchronized update with Cloud Server.Classified lexicon classifies to obtain to pending short message for mobile terminal It must classify as a result, wherein, classification results are refuse messages or non-junk short message.Cloud Server corresponds to different mobile terminal and is stored with Classified lexicon corresponding with each mobile terminal.A certain number of classified rubbish are stored in publicly-owned short message training set Short message and non-junk short message.
Before carrying out classification judgement to pending short message for the first time, mobile terminal can upload user judge to obtain it is a certain number of In the privately owned short message training set corresponding with mobile terminal that refuse messages and non-junk short message are stored to Cloud Server;In addition, The initial time of filtering junk short messages privately owned short message training set corresponding with mobile terminal is alternatively sky.For the first time to pending short Before letter carries out classification judgement, mobile terminal obtained by modes such as GPRS, WiFi stored in cloud service it is corresponding with mobile terminal Classified lexicon to carry out classification judgement.
Step S202:Receive the classification error information that mobile terminal uploads;
When the classification results that user judges to obtain mobile terminal acquisition are wrong classification results and mobile terminal receives When the uploading instructions of the classification results of corresponding mistake, the classification error information that cloud server mobile terminal uploads, classification mistake False information includes the classification results of pending short message and mistake, and the classification results of mistake are that will belong to the pending of refuse messages It is SMS classified be non-junk short message or by belong to non-junk short message it is pending it is SMS classified be refuse messages.
Step S203:Pending short message is added in privately owned short message training set;
Pending short message in classification error information is added privately owned short message corresponding with mobile terminal and trained by Cloud Server It concentrates to update privately owned short message training set corresponding with mobile terminal.When the classification results for the mistake that mobile terminal uploads are that will belong to In refuse messages it is pending it is SMS classified be non-junk short message when, pending short message is added privately owned short message and trained by Cloud Server In the refuse messages class of concentration;When the classification results for the mistake that mobile terminal uploads are that will belong to the pending short of non-junk short message When letter is classified as refuse messages, pending short message is added in the non-junk short message class in privately owned short message training set Cloud Server.
Step S204:Privately owned short message training set and publicly-owned short message training set are learnt.
After privately owned short message training set and/or the update of publicly-owned short message training set, Cloud Server to privately owned short message training set and Publicly-owned short message training set is learnt to obtain Word library updating information, and the acquisition of Word library updating information specifically includes following two feelings Condition:(1)When privately owned short message training set is not have the classified short of memory mobile terminal upload in empty i.e. privately owned short message training set Letter and without update privately owned short message training set when, Word library updating information be Cloud Server to updated publicly-owned short message training set into Row learns and obtains;(2)When privately owned short message training set is not empty, Word library updating information be in privately owned short message training set and/or Cloud Server learns publicly-owned short message training set and privately owned short message training set and is obtained after publicly-owned short message training set update.It moves Dynamic terminal updates the classified lexicon stored in mobile terminal according to Word library updating synchronizing information, at this time the classificating word in Cloud Server Library is updated also according to Word library updating information, wherein Word library updating information can be stored in whole with movement on Cloud Server It holds in corresponding classified lexicon.It is each in the classified lexicon of mobile terminal storage before carrying out classification judgement to pending short message A word feature and rule feature stored on the matching probability and Cloud Server in refuse messages and non-junk short message with The corresponding classified lexicon of mobile terminal keeps synchronizing.
Referring to Fig. 4, when mistake classification results be by belong to refuse messages it is pending it is SMS classified be that non-junk is short When letter, in method for filtering spam short messages second embodiment of the present invention privately owned short message training set update after Cloud Server to privately owned Short message training set and publicly-owned short message training set are learnt to specifically include following sub-step to obtain Word library updating information:
Sub-step S2041a:Pending short message is pre-processed with obtain the corresponding word feature of pending short message and Rule feature;
Cloud Server pre-processes pending short message to obtain the corresponding word feature of pending short message and rule Feature, pending short message X are expressed as:X={x1,x2,…,xn, xk(k=1,2 ..., n) it is that the corresponding word of pending short message is special Sign and rule feature.
Sub-step S2042a:According to word feature and the matching frequency of rule feature, refuse messages quantity, non-junk short message Quantity obtains the first Word library updating information.
Cloud Server is according to the corresponding word feature of pending short message and rule feature x in publicly-owned short message training setkIn rubbish The corresponding word feature of pending short message and rule feature x in the matching frequency, privately owned training set in rubbish short messagekIn refuse messages In the matching frequency, privately owned short message training set and refuse messages quantity, non-junk short message quantity in publicly-owned short message training set obtains Obtain the first Word library updating information, wherein the first Word library updating information is included in privately owned short message training set corresponding with mobile terminal The corresponding word feature of pending short message and rule feature x after updatekMatching probability, refuse messages institute in refuse messages The ratio shared by ratio and non-junk short message accounted for.Mobile terminal updates classified lexicon according to the first Word library updating synchronizing information Namely the word feature and rule feature x stored in modification classified lexicon corresponding with mobile terminalkIn refuse messages Matching probability, the matching probability by the word feature for not including in classified lexicon corresponding with mobile terminal in refuse messages It is added in classified lexicon, and changes shared by ratio and the non-junk short message shared by the refuse messages stored in classified lexicon Ratio.Word feature and rule feature xkMatching probability in refuse messages is equal to pending in publicly-owned short message training set The corresponding word feature of short message and rule feature xkPending short message pair in the matching frequency+privately owned training set in refuse messages The word feature and rule feature x answeredkThe matching frequency in refuse messages and divided by privately owned short message training set and publicly-owned short Believe the refuse messages quantity in training set.
Referring to Fig. 5, when mistake classification results be by belong to non-junk short message it is pending it is SMS classified be that rubbish is short When letter, in method for filtering spam short messages second embodiment of the present invention privately owned short message training set update after Cloud Server to privately owned Short message training set and publicly-owned short message training set are learnt to specifically include following sub-step to obtain Word library updating information:
Sub-step S2041b:Pending short message is pre-processed with obtain the corresponding word feature of pending short message and Rule feature;
Cloud Server pre-processes pending short message to obtain the corresponding word feature of pending short message and rule Feature.
Sub-step S2042b:According to word feature, the matching frequency of rule feature, refuse messages quantity, non-junk short message Quantity obtains the second Word library updating information.
Cloud Server is according to the corresponding word feature of pending short message and rule feature x in publicly-owned short message training setkNon- The corresponding word feature of pending short message and rule feature x in the matching frequency, privately owned training set in refuse messageskIn non-junk The matching frequency, privately owned short message training set in short message and the refuse messages quantity in publicly-owned short message training set, non-junk short message number Amount obtains the second Word library updating information, wherein the second Word library updating information is included in privately owned short message instruction corresponding with mobile terminal Practice the corresponding word feature of pending short message and rule feature x after collection updatekMatching probability, rubbish in non-junk short message The ratio shared by ratio and non-junk short message shared by short message.Mobile terminal is according to the update point of the second Word library updating synchronizing information The word feature and rule feature x stored in class dictionary namely modification classified lexicon corresponding with mobile terminalkIn non-junk Matching probability in short message, by the word feature for not including in classified lexicon corresponding with mobile terminal in non-junk short message Matching probability be added in classified lexicon, and the ratio shared by the refuse messages that store and non-rubbish in modification classified lexicon Ratio shared by rubbish short message.Word feature and rule feature xkMatching probability in non-junk short message is instructed equal to publicly-owned short message Practice and concentrates the corresponding word feature of pending short message and rule feature xkThe matching frequency+privately owned training set in non-junk short message In the corresponding word feature of pending short message and rule feature xkIt is the matching frequency in non-junk short message and divided by privately owned short Believe the non-junk short message quantity in training set and publicly-owned short message training set.
When publicly-owned short message training set updates, the update of publicly-owned short message training set includes increasing refuse messages or the non-rubbish of increase Rubbish short message increases refuse messages and non-junk short message simultaneously, the update with above-mentioned privately owned short message training set with study similarly, Short message to updating part in publicly-owned short message training set pre-processes, further according to word feature, the matching of rule feature The frequency, refuse messages quantity, non-junk short message quantity obtain corresponding Word library updating information, special to update word feature, rule It levies shared by the matching probability in refuse messages and/or non-junk short message, the ratio shared by refuse messages and non-junk short message Ratio.When privately owned short message training set and publicly-owned short message training set update simultaneously, also with above-mentioned publicly-owned short message training set, privately owned Similarly with study, details are not described herein again for the update of short message training set.
Cloud Server pair privately owned short message training set corresponding with mobile terminal and publicly-owned short message training set are learnt i.e. root According to the matching frequency, refuse messages quantity and the non-rubbish of word feature, rule feature in refuse messages and non-junk short message Rubbish short message quantity obtains matching probability, the refuse messages of word feature, rule feature in refuse messages and non-junk short message Ratio shared by shared ratio and non-junk short message, the matching probability of acquisition, shared ratio are stored in classified lexicon, Different mobile terminal corresponds to different classified lexicons.When privately owned short message training set and/or publicly-owned short message training set update, cloud clothes Business device need to only pre-process the short message for updating part, that is, retain privately owned short message training set and/or publicly-owned short message instruction before update Practice the corresponding word feature of each short message concentrated and rule feature, the effect of Cloud Server pretreatment and study can be improved Rate, and then improve the efficiency of update classified lexicon.
In addition, further including the sender of pending short message in the classification error information that cloud server mobile terminal uploads Number, Cloud Server judges whether the addition Cloud Server storage of sender's number and movement after receiving sender's number In the corresponding privately owned black and white lists of terminal and/or publicly-owned black and white lists, if then Cloud Server update is corresponding with mobile terminal Privately owned black and white lists and/or publicly-owned black and white lists are updated with obtaining privately owned black and white lists fresh information and/or publicly-owned black and white lists Information, so that the publicly-owned black and white lists and/or privately owned black and white lists of mobile terminal synchronization update mobile terminal storage.Publicly-owned black and white List fresh information, privately owned black and white lists fresh information include that sender's number and sender's number correspond to the name being added It is single.For example, if more than a preset quantity such as 10,000 one sender's numbers of user's report that sender's number addition is publicly-owned black In list;When more than another preset quantity such as 100 one sender's numbers of user's report simultaneously the short message content obviously containing against Then sender's number is added in publicly-owned blacklist for method content.Belong in another example obtaining pending short message in correct judgement Refuse messages will belong to the pending SMS classified for after non-junk short message of refuse messages, by the corresponding transmission of pending short message Person's number uploads in Cloud Server, and Cloud Server further sender's number is added corresponding with mobile terminal privately owned black In white list.
It is appreciated that method for filtering spam short messages second embodiment of the present invention it is stored by Cloud Server with shifting The dynamic corresponding privately owned short message training set of terminal and publicly-owned short message training set are learnt to obtain classification corresponding with mobile terminal Dictionary, mobile terminal carry out classification judgement according to classified lexicon to pending short message, when the classification for receiving mobile terminal upload Cloud Server is learnt and obtains Word library updating information after error message, and then mobile terminal synchronization update mobile terminal is made to deposit The classified lexicon of storage, Cloud Server store larger publicly-owned short message training set, privately owned short message training set and the execution of occupied space The larger learning process of calculation amount can improve mobile terminal to the filter efficiency of refuse messages and reduce accounting for for mobile terminal With space, and Cloud Server corresponds to different mobile terminal and is stored with corresponding privately owned short message training set and classified lexicon, makes rubbish The filtering of rubbish short message has personalization, and then improves the filtering accuracy of refuse messages.
Referring to Fig. 6, one embodiment of mobile terminal of the present invention includes:
Sort module 301, for being classified pending short message to obtain according to the classified lexicon stored in mobile terminal It must classify as a result, and being classified to follow-up pending short message according to updated classified lexicon;Before specific implementation can refer to The corresponding realization processes of step S101 are stated, are no longer repeated herein.
Uploading module 302, the classification results for being obtained when sort module 301 are judged as the classification results of mistake and shifting When dynamic terminal receives the uploading instructions of the classification results of corresponding mistake, classification error information is uploaded into Cloud Server to update Privately owned short message training set corresponding with mobile terminal;Specific implementation can refer to the corresponding realization processes of abovementioned steps S102, herein No longer repeat.
Mobile terminal to update module 303, for obtaining the Word library updating information of Cloud Server with synchronized update mobile terminal The classified lexicon of middle storage, and obtain privately owned black and white lists fresh information and/or the update of publicly-owned black and white lists of Cloud Server The publicly-owned black and white lists and/or privately owned black and white lists that information is stored with synchronized update mobile terminal;Specific implementation can refer to aforementioned The corresponding realization processes of step S103, are no longer repeated herein.
Referring to Fig. 7, one embodiment of Cloud Server of the present invention includes:
Study module 401, for the privately owned short message training set corresponding with mobile terminal that is stored to Cloud Server and publicly-owned Short message training set is learnt to obtain corresponding with mobile terminal classified lexicon, be additionally operable in privately owned short message training set and/or After publicly-owned short message training set update, privately owned short message training set and publicly-owned short message training set are learnt to obtain Word library updating letter Breath, and then mobile terminal is made to update the classified lexicon stored in mobile terminal according to Word library updating synchronizing information;Specific implementation can With reference to the corresponding realization processes of abovementioned steps S201, no longer repeat herein.
Cloud Server update module 402, for when classification results be judged as mistake classification results and mobile terminal connect When receiving the uploading instructions of the classification results of corresponding mistake, the classification error information that mobile terminal uploads is received, and will classification Pending short message in error message is added in the corresponding privately owned short message training set of mobile terminal to update privately owned short message training set; It is additionally operable to judge whether that the privately owned black and white lists corresponding with mobile terminal of its storage and/or publicly-owned black are added in sender's number In white list, if then Cloud Server update module 402 updates corresponding with mobile terminal privately owned black and white lists and/or publicly-owned black White list is to obtain privately owned black and white lists fresh information and/or publicly-owned black and white lists fresh information;Specific implementation can refer to aforementioned The corresponding realization processes of step S202, are no longer repeated herein.
Referring to Fig. 8, one embodiment of filtering short message system of the present invention includes mobile terminal and server:
Mobile terminal includes:Privately owned black and white lists, classified lexicon, participle dictionary, deactivate dictionary, private at publicly-owned black and white lists There are black and white lists filtering module 501, publicly-owned black and white lists filtering module 502, sort module 503, uploading module 504 and movement Terminal update module 505, wherein privately owned black and white lists, publicly-owned black and white lists, classified lexicon, participle dictionary and deactivated dictionary are equal Synchronized update is kept by mobile terminal to update module 505 and Cloud Server.
Privately owned black and white lists filtering module 501 and publicly-owned black and white lists filtering module 502, for passing through privately owned black and white name Single and publicly-owned black and white lists carry out pending short message the filtering of black and white lists, realize the preliminary fast filtering of refuse messages;Tool Body realization can refer to the corresponding realization process of aforementioned black and white lists filtration step, no longer repeat herein.
Sort module 503 is used for when pending short message is not in publicly-owned, privately owned black and white lists, first according to participle dictionary Processing short message is treated with deactivated dictionary and carries out pretreatment acquisition word feature and rule feature, is secondly stored according in mobile terminal Classified lexicon classify pending short message to obtain classification results;It is corresponding that specific implementation can refer to abovementioned steps S101 Realization process, is no longer repeated herein.
Uploading module 504, for when the classification results and mobile terminal that the classification results of above-mentioned sort module 503 are mistake When receiving the uploading instructions of the classification results of corresponding mistake, it is mobile whole to update that classification error information is uploaded to Cloud Server Hold corresponding privately owned short message training set and privately owned black and white lists;Specific implementation can refer to that abovementioned steps S102 is corresponding to be realized Journey no longer repeats herein.
Mobile terminal to update module 505 is used to obtain the publicly-owned black and white lists fresh information of Cloud Server and/or privately owned black The publicly-owned black and white lists and/or privately owned black and white lists that white list updating information is stored with synchronized update mobile terminal;It is additionally operable to obtain Obtain classified lexicon of the Word library updating information of Cloud Server to be stored in synchronized update mobile terminal;It is additionally operable to obtain Cloud Server Participle Word library updating information and/or the participle dictionary that is stored with synchronized update mobile terminal of deactivated dictionary fresh information and/or Deactivate dictionary;Specific implementation can refer to the corresponding realization processes of abovementioned steps S103, no longer repeat herein.
Cloud Server includes:It segments dictionary, deactivate dictionary, publicly-owned short message training set, privately owned short message training set, publicly-owned black and white List, privately owned black and white lists, classified lexicon, study module 506 and Cloud Server update module 507.Wherein, segment dictionary, Deactivate dictionary, publicly-owned short message training set and publicly-owned black and white lists are that all mobile terminals share in rubbish filtering system, and Privately owned short message training set, privately owned black and white lists, classified lexicon are then to correspond to each mobile terminal respectively, and each mobile terminal is not Together.
Study module 506, participle dictionary and deactivated dictionary for being stored according to Cloud Server store Cloud Server Publicly-owned short message training set and/or the corresponding privately owned short message training set of mobile terminal learnt with obtain it is corresponding with mobile terminal Classified lexicon;It is additionally operable to after publicly-owned short message training set and/or the update of privately owned short message training set, to publicly-owned short message training set And/or privately owned short message training set is learnt to obtain Word library updating information, and then make mobile terminal according to Word library updating information The classified lexicon stored in synchronized update mobile terminal;Specific implementation can refer to the corresponding realization processes of abovementioned steps S201, This is no longer repeated.
Cloud Server update module 507, the classification error information for receiving mobile terminal upload;Being additionally operable to will be pending Short message is added in the corresponding privately owned short message training set of mobile terminal to update privately owned short message training set, and publicly-owned black for updating White list and/or privately owned black and white lists are to obtain publicly-owned black and white lists fresh information and/or privately owned black and white lists fresh information;Tool Body realization can refer to the corresponding realization processes of abovementioned steps S202, no longer repeat herein.
Publicly-owned short message training set is for storing a certain number of classified refuse messages and non-junk short message, Cloud Server The refuse messages of the corresponding word feature of short message and rule feature in publicly-owned short message training set in the publicly-owned short message training set obtained In the matching frequency, in publicly-owned short message training set refuse messages quantity, word feature and rule feature in publicly-owned short message training set Non-junk short message in the matching frequency, non-junk short message quantity can be stored in publicly-owned short message training set in publicly-owned short message training set In, it can also be stored in other storage locations such as study module 506 of Cloud Server.Privately owned short message training set is mobile whole for storing The classified refuse messages uploaded and non-junk short message are held, short message pair in the privately owned short message training set that similarly Cloud Server obtains The information such as the matching frequency of the word feature and rule feature answered in privately owned short message training set can be stored in privately owned short message training It concentrates, can also be stored in other storage locations such as study module 506 of Cloud Server.Classified lexicon is for storing Cloud Server pair Privately owned short message training set corresponding with mobile terminal and publicly-owned short message training set are learnt and the word feature and rule that obtain Matching probability of the feature in refuse messages and the matching probability in non-junk short message, the ratio shared by refuse messages and Ratio shared by non-junk short message.Participle dictionary is for storing the corresponding each significant word feature of short message.Deactivate dictionary Do not have contributive word feature to SMS classified for storing comprising the single word that is formed after participle, interjection, the tone help Word, pronoun etc..Publicly-owned black and white lists are for storing blacklist is generally added in user refuse messages sender number and addition Non-junk short message sending person's number of white list.Privately owned black and white lists are for the rubbish that blacklist is added corresponding with mobile terminal Short message sending person number and non-junk short message sending person's number that white list is added.
Filtering short message system of the present invention is distributed frame, and the classification of mobile terminal execution short message judges, utilizes processing energy Power executes classification compared with the faster Cloud Server of strong and processing speed and judges required learning process, can improve the mistake of refuse messages Efficiency is filtered, makes the filtering of refuse messages that there is personalization.
Mode the above is only the implementation of the present invention is not intended to limit the scope of the invention, every to utilize this Equivalent structure or equivalent flow shift made by description of the invention and accompanying drawing content, it is relevant to be applied directly or indirectly in other Technical field is included within the scope of the present invention.

Claims (11)

1. a kind of method for filtering spam short messages, which is characterized in that including:
Mobile terminal classifies to obtain classification results to pending short message according to the classified lexicon of its storage, wherein described Classification results are refuse messages or non-junk short message;
When the classification results are judged as the classification results of mistake and mobile terminal receives the classification knot of the corresponding mistake When the uploading instructions of fruit, classification error information is uploaded to Cloud Server to update and the mobile terminal pair by the mobile terminal The privately owned short message training set answered, wherein the classification error information includes the classification results of pending short message and mistake;
The mobile terminal obtains the classificating word that the Word library updating information of Cloud Server is stored with mobile terminal described in synchronized update Library, wherein the Word library updating information be the privately owned short message training set corresponding with mobile terminal stored in Cloud Server and/or Cloud Server learns privately owned short message training set and publicly-owned short message training set and is obtained, institute after publicly-owned short message training set update Predicate library fresh information obtains when being updated at least one of the privately owned short message training set and the publicly-owned short message training set .
2. according to the method described in claim 1, it is characterized in that, the mobile terminal is treated according to the classified lexicon of its storage The step of processing short message is classified to obtain classification results specifically includes:
The mobile terminal pre-processes pending short message to obtain the corresponding word feature of pending short message and rule Feature;
The mobile terminal is by the ratio P (C shared by the refuse messages stored in classified lexicon1), the ratio shared by non-junk short message P(C2), the matching probability P (x of word feature and rule feature in refuse messagesk|C1) and non-junk short message in matching Probability P (xk|C2) substitute into Bayes's classification formula, to obtain the probability P (C that the pending short message belongs to refuse messages1| X), the Bayes's classification formula is specific as follows shown:
The mobile terminal obtains the probability P (C that pending short message belongs to non-junk short message2| X), it is specific as follows shown:
P(C2| X)=1-P (C1|X)
The mobile terminal obtains the classification results of pending short message, wherein as P (C1|X)>P(C2| X) when it is then described pending Short message belongs to refuse messages, and otherwise the pending short message belongs to non-junk short message.
3. according to the method described in claim 2, it is characterized in that,
Pending short message is pre-processed in the mobile terminal to obtain the corresponding word feature of pending short message and rule Further include before the step of feature then:
Whether the mobile terminal judges sender's number of pending short message in privately owned black and white lists corresponding with mobile terminal In, wherein when sender's number is in privately owned blacklist corresponding with mobile terminal, then the pending short message belongs to Refuse messages, when sender's number is in privately owned white list corresponding with mobile terminal, then the pending short message belongs to Non-junk short message;
When sender's number is not in privately owned black and white lists corresponding with mobile terminal, the mobile terminal continues to judge Whether sender's number is in publicly-owned black and white lists, wherein then described when sender's number is in publicly-owned blacklist to wait for Processing short message belongs to refuse messages, and when sender's number is in publicly-owned white list, then the pending short message belongs to non-rubbish Rubbish short message;
When sender's number is not in publicly-owned black and white lists, pending short message is carried out described in the mobile terminal execution The step of pretreatment is to obtain the corresponding word feature of pending short message and rule feature.
4. according to the method described in claim 3, it is characterized in that,
When the classification results are judged as the classification results of mistake and mobile terminal receives the classification knot of the corresponding mistake When the uploading instructions of fruit, the classification error information that the mobile terminal uploads to Cloud Server further includes the transmission of pending short message Sender's number is uploaded to Cloud Server to judge whether the sender's number cloud is added by person's number, the mobile terminal In the privately owned black and white lists corresponding with mobile terminal of server storage and/or publicly-owned black and white lists;
When the privately owned black and white lists corresponding with mobile terminal of Cloud Server storage and/or publicly-owned black and white lists update, The privately owned black and white lists fresh information and/or publicly-owned black and white lists fresh information of the mobile terminal acquisition Cloud Server are with synchronization Update the publicly-owned black and white lists of mobile terminal storage and/or privately owned black and white lists.
5. method according to claim 1 or 4, which is characterized in that
The classification results of the mistake are that will belong to the pending of refuse messages SMS classified to be non-junk short message or will belong to Non-junk short message it is pending it is SMS classified be refuse messages;
The Word library updating information includes at least the word feature and rule of pending short message after the update of privately owned short message training set Shared by matching probability of the feature in refuse messages or non-junk short message, the ratio shared by refuse messages and non-junk short message Ratio.
6. a kind of method for filtering spam short messages, which is characterized in that including:
The privately owned short message training set corresponding with mobile terminal and publicly-owned short message training set that Cloud Server stores it learn To obtain corresponding with mobile terminal classified lexicon, the classified lexicon be used for mobile terminal to pending short message classify with Obtain classification results, wherein the classification results are refuse messages or non-junk short message;
When the classification results are judged as the classification results of mistake and mobile terminal receives the classification knot of the corresponding mistake When the uploading instructions of fruit, the classification error information of the cloud server mobile terminal upload, wherein the classification error letter Breath includes the classification results of pending short message and mistake;
The Cloud Server pending short message is added privately owned short to update in privately owned short message training set corresponding with mobile terminal Believe training set;
After the privately owned short message training set and/or the update of publicly-owned short message training set, the Cloud Server trains privately owned short message Collection and publicly-owned short message training set are learnt to obtain Word library updating information, and the Word library updating information is the privately owned short message instruction It is obtained when practicing the update of at least one of collection and described publicly-owned short message training set.
7. according to the method described in claim 6, it is characterized in that,
The classification results of the mistake are that will belong to the pending of refuse messages SMS classified to be non-junk short message or will belong to Non-junk short message it is pending it is SMS classified be refuse messages;
When the classification results of the mistake be by belong to refuse messages it is pending it is SMS classified be non-junk short message when, described After privately owned short message training set update, the Cloud Server learns to obtain privately owned short message training set and publicly-owned short message training set The step of obtaining Word library updating information specifically includes:
The Cloud Server pre-processes pending short message to obtain the corresponding word feature of pending short message and rule Feature;
The Cloud Server is according to the matching of word feature and rule feature in refuse messages described in publicly-owned short message training set The matching frequency in refuse messages of word feature and rule feature described in the frequency, privately owned training set, privately owned short message training set The first Word library updating information is obtained with refuse messages quantity, the non-junk short message quantity in publicly-owned short message training set, wherein described First Word library updating information include after the update of privately owned short message training set the word feature of pending short message and rule feature in rubbish The ratio shared by matching probability, refuse messages in rubbish short message and the ratio shared by non-junk short message;
When the classification results of the mistake be by belong to non-junk short message it is pending it is SMS classified be refuse messages when, described After privately owned short message training set update, the Cloud Server learns to obtain privately owned short message training set and publicly-owned short message training set The step of obtaining Word library updating information specifically includes:
The Cloud Server pre-processes pending short message to obtain the corresponding word feature of pending short message and rule Feature;
The Cloud Server is according to of word feature and rule feature in non-junk short message described in publicly-owned short message training set The matching frequency, privately owned short message instruction with word feature and rule feature described in the frequency, privately owned training set in non-junk short message Practice the refuse messages quantity in collection and publicly-owned short message training set, non-junk short message quantity obtains the second Word library updating information, wherein The second Word library updating information includes the word feature of pending short message and rule feature after privately owned short message training set updates The ratio shared by matching probability, refuse messages in non-junk short message and the ratio shared by non-junk short message.
8. the method according to the description of claim 7 is characterized in that
The classification error information further includes sender's number of pending short message, and the Cloud Server judges whether sender Number is added in privately owned black and white lists corresponding with mobile terminal and/or the publicly-owned black and white lists of Cloud Server storage, if then The Cloud Server updates privately owned black and white lists corresponding with mobile terminal and/or publicly-owned black and white lists to obtain privately owned black and white name Single fresh information and/or publicly-owned black and white lists fresh information, so that the public affairs of mobile terminal synchronization update mobile terminal storage There are black and white lists and/or privately owned black and white lists.
9. a kind of mobile terminal, which is characterized in that including:
Sort module, the classified lexicon for being stored according to mobile terminal classify to pending short message to obtain classification knot Fruit, wherein the classification results be refuse messages or non-junk short message, the classified lexicon be Cloud Server it is stored with The corresponding privately owned short message training set of mobile terminal and publicly-owned short message training set are learnt and are obtained;
Uploading module, for be judged as the classification results of mistake when the classification results and mobile terminal receive it is corresponding described in When the uploading instructions of the classification results of mistake, it is corresponding with mobile terminal to update that classification error information is uploaded into Cloud Server Privately owned short message training set, wherein the classification error information includes the classification results of pending short message and mistake;
Mobile terminal to update module, for obtaining the Word library updating information of Cloud Server to be deposited in mobile terminal described in synchronized update The classified lexicon of storage, wherein the Word library updating information is after privately owned short message training set and/or the update of publicly-owned short message training set Cloud Server learns privately owned short message training set and publicly-owned short message training set and is obtained, and the Word library updating information is described It is obtained when at least one of privately owned short message training set and the publicly-owned short message training set update.
10. a kind of Cloud Server, which is characterized in that including:
Study module, the privately owned short message training set corresponding with mobile terminal for being stored to Cloud Server and the training of publicly-owned short message Collection is learnt to obtain classified lexicon corresponding with mobile terminal, and the classified lexicon is for mobile terminal to pending short message Classify to obtain classification results, wherein the classification results are refuse messages or non-junk short message;
Cloud Server update module, when the classification results be judged as mistake classification results and mobile terminal receive correspondence When the uploading instructions of the classification results of the mistake, the classification error information for receiving mobile terminal upload, wherein described point Class error message includes the classification results of pending short message and mistake;
The Cloud Server update module is additionally operable to the pending short message privately owned short message training set corresponding with mobile terminal is added In to update privately owned short message training set;
The study module is additionally operable to after privately owned short message training set and/or the update of publicly-owned short message training set, is instructed to privately owned short message Practice collection and publicly-owned short message training set is learnt to obtain Word library updating information, and then makes the mobile terminal according to Word library updating The classified lexicon stored in synchronizing information update mobile terminal, the Word library updating information is the privately owned short message training set and institute It is obtained when stating the update of at least one of publicly-owned short message training set.
11. a kind of filtering junk short messages system, which is characterized in that including:Mobile terminal as claimed in claim 9 and such as power Profit requires the Cloud Server described in 10.
CN201310279728.8A 2013-07-04 2013-07-04 A kind of method for filtering spam short messages, system, mobile terminal and Cloud Server Active CN104284306B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310279728.8A CN104284306B (en) 2013-07-04 2013-07-04 A kind of method for filtering spam short messages, system, mobile terminal and Cloud Server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310279728.8A CN104284306B (en) 2013-07-04 2013-07-04 A kind of method for filtering spam short messages, system, mobile terminal and Cloud Server

Publications (2)

Publication Number Publication Date
CN104284306A CN104284306A (en) 2015-01-14
CN104284306B true CN104284306B (en) 2018-07-24

Family

ID=52258688

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310279728.8A Active CN104284306B (en) 2013-07-04 2013-07-04 A kind of method for filtering spam short messages, system, mobile terminal and Cloud Server

Country Status (1)

Country Link
CN (1) CN104284306B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104967981A (en) * 2015-07-06 2015-10-07 王小安 Crank call and text message blocking method
CN105162984B (en) * 2015-09-23 2018-11-23 小米科技有限责任公司 Telephone number recognition methods and device
CN105307176B (en) * 2015-11-10 2019-03-08 中国科学院信息工程研究所 Robustness message routing method in a kind of mobile social opportunistic network
CN110019773A (en) * 2017-08-14 2019-07-16 中国移动通信有限公司研究院 A kind of refuse messages detection method, terminal and computer readable storage medium
CN107517452A (en) * 2017-09-04 2017-12-26 上海连尚网络科技有限公司 A kind of method, equipment and computer-readable storage medium for being used to manage short message
CN112597282B (en) * 2021-01-24 2021-06-11 深圳市诚立业科技发展有限公司 Management method applied to short message data security
CN115065972B (en) * 2022-06-09 2024-01-12 通华大数据科技(烟台)有限公司 Junk information clearing system based on communication big data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877837A (en) * 2009-04-30 2010-11-03 华为技术有限公司 Method and device for short message filtration
CN102547623A (en) * 2010-12-08 2012-07-04 中国电信股份有限公司 Junk short message processing method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877837A (en) * 2009-04-30 2010-11-03 华为技术有限公司 Method and device for short message filtration
CN102547623A (en) * 2010-12-08 2012-07-04 中国电信股份有限公司 Junk short message processing method and system

Also Published As

Publication number Publication date
CN104284306A (en) 2015-01-14

Similar Documents

Publication Publication Date Title
CN104284306B (en) A kind of method for filtering spam short messages, system, mobile terminal and Cloud Server
CN110287479A (en) Name entity recognition method, electronic device and storage medium
CN109344291A (en) A kind of video generation method and device
CN103500195B (en) Grader update method, device, system and equipment
CN102073704B (en) Text classification processing method, system and equipment
CN104462064A (en) Method and system for prompting content input in information communication of mobile terminals
CN103533152A (en) Short message processing method and system of mobile terminal
CN112579733B (en) Rule matching method, rule matching device, storage medium and electronic equipment
CN112989800A (en) Multi-intention identification method and device based on Bert sections and readable storage medium
CN106202053A (en) A kind of microblogging theme sentiment analysis method that social networks drives
CN110309339A (en) Picture tag generation method and device, terminal and storage medium
CN110209810A (en) Similar Text recognition methods and device
CN109471932A (en) Rumour detection method, system and storage medium based on learning model
CN102646124A (en) Method for automatically identifying address information
CN106649338B (en) Information filtering strategy generation method and device
CN108334353B (en) Skill development system and method
US10217455B2 (en) Linguistic model database for linguistic recognition, linguistic recognition device and linguistic recognition method, and linguistic recognition system
CN103607515A (en) Short message merging device and method
CN113626624B (en) Resource identification method and related device
CN114491149A (en) Information processing method and apparatus, electronic device, storage medium, and program product
CN109783807A (en) A kind of user comment method for digging for APP software defect
CN117332062A (en) Data processing method and related device
CN110175288B (en) Method and system for filtering character and image data for teenager group
CN112487817A (en) Named entity recognition model training method, sample labeling method, device and equipment
CN102238098B (en) A kind of information synthesis method and the terminal of correspondence and instant communicating system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant