CN104284306A - Junk message filter method and system, mobile terminal and cloud server - Google Patents

Junk message filter method and system, mobile terminal and cloud server Download PDF

Info

Publication number
CN104284306A
CN104284306A CN201310279728.8A CN201310279728A CN104284306A CN 104284306 A CN104284306 A CN 104284306A CN 201310279728 A CN201310279728 A CN 201310279728A CN 104284306 A CN104284306 A CN 104284306A
Authority
CN
China
Prior art keywords
note
mobile terminal
training set
owned
pending
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310279728.8A
Other languages
Chinese (zh)
Other versions
CN104284306B (en
Inventor
何通庆
郭伟
方礼勇
杜国楹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Eren Eben Information Technology Co Ltd
Original Assignee
Beijing Eren Eben Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Eren Eben Information Technology Co Ltd filed Critical Beijing Eren Eben Information Technology Co Ltd
Priority to CN201310279728.8A priority Critical patent/CN104284306B/en
Publication of CN104284306A publication Critical patent/CN104284306A/en
Application granted granted Critical
Publication of CN104284306B publication Critical patent/CN104284306B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/12Messaging; Mailboxes; Announcements
    • H04W4/14Short messaging services, e.g. short message services [SMS] or unstructured supplementary service data [USSD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/12Detection or prevention of fraud

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Transfer Between Computers (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The embodiment of the invention discloses a junk message filter method. The method comprises the steps that a mobile terminal classifies short messages to be processed according to a classification word bank stored by the mobile terminal to obtain a classification result, when the classification result is judged to be wrong and the mobile terminal receives an uploading instruction corresponding to the wrong classification result, the mobile terminal uploads wrong classification information to a cloud server to update a private short message training set corresponding to the mobile terminal, and the mobile terminal obtains word bank update information of the cloud server to update the classification word band stored in the mobile terminal synchronously. The embodiment of the invention further discloses the mobile terminal, the cloud server and a junk message filter system. Therefore, by means of the junk message filter method, the mobile terminal, the cloud server and the junk message filter system, the junk message filter efficiency of the mobile terminal is increased, and the filtering individuation of the junk messages is achieved.

Description

A kind of method for filtering spam short messages, system, mobile terminal and Cloud Server
Technical field
The present invention relates to text classification field, particularly relate to a kind of method for filtering spam short messages, system, mobile terminal and Cloud Server.
Background technology
Along with the develop rapidly of mobile communication technology and the rapid lifting of mobile phone popularity rate, the plurality of advantages such as note is just short and small with it, rapid, easy, cheap become a kind of important communication and the exchange way of people day by day, bringing to user exchanges conveniently greatly, simultaneously, refuse messages is becoming increasingly rampant, especially in smart mobile phone today that universal, personal information security problem is increasingly serious fast, many users are deeply by the puzzlement of refuse messages.Refuse messages refers to what user did not customize, include in the contents such as advertisement, deception, pornographic and short time and send same content continuously, affect the note that user normally uses, works and lives, common refuse messages content comprises advertising message, pornography, false prize information, fraud information, mischief etc., namely to the nugatory information of user, bring a lot of worries to user, be therefore badly in need of carrying out monitoring filtering to refuse messages.The filter method of two kinds of refuse messages is mainly comprised: a kind of method processes in note processing centers such as short message service centers (SMSC) in prior art; Another kind method is then the filter process performing whole refuse messages on the mobile terminals such as mobile phone with the embedded program of establishment.
Present inventor finds in long-term R & D, some information such as lottery information, ticket information, advertising message etc. may be refuse messages for a part of user, but for another part user, then do not belong to refuse messages, carry out filtering at short message service center and may cause cannot being arrived on the mobile terminal of user by the information of misclassification, the filtration of refuse messages lacks the demand difference considering different user; In addition due to the computational speed of mobile terminal and space all more limited, the filter process performing whole refuse messages on mobile terminals can consume the too much time and space, affects the normal reception of user to note.
Summary of the invention
The technical problem that the present invention mainly solves is to provide a kind of method for filtering spam short messages, system, mobile terminal and Cloud Server, can improve the filter efficiency of mobile terminal to refuse messages, makes the filtration of refuse messages have personalization.
For solving the problems of the technologies described above, a first aspect of the present invention is: provide a kind of method for filtering spam short messages, comprise: mobile terminal classifies to obtain classification results to pending note according to its classified lexicon stored, wherein, classification results is refuse messages or non-junk note; When classification results is judged as the classification results of mistake and mobile terminal receives the uploading instructions of the classification results of corresponding mistake, classification error information is uploaded to Cloud Server to upgrade the privately owned note training set corresponding with mobile terminal by mobile terminal, wherein, classification error information comprises the classification results of pending note and mistake; The classified lexicon that the Word library updating information that mobile terminal obtains Cloud Server stores with synchronized update mobile terminal, wherein, Word library updating information is that Cloud Server learns to obtain to privately owned note training set and publicly-owned note training set after the privately owned note training set corresponding with mobile terminal and/or the renewal of publicly-owned note training set of Cloud Server storage.
Wherein, mobile terminal specifically comprises the step that pending note classifies to obtain classification results according to its classified lexicon stored: mobile terminal carries out preliminary treatment to obtain word feature corresponding to pending note and rule feature to pending note; Mobile terminal is by the ratio P (C shared by the refuse messages that stores in classified lexicon 1), ratio P (C shared by non-junk note 2), word feature and the matching probability P (x of rule feature in refuse messages k| C 1) and non-junk note in matching probability P (x k| C 2) substitute in Bayes's classification formula, to obtain the probability P (C that pending note belongs to refuse messages 1| X), shown in Bayes's classification formula is specific as follows:
P ( C 1 | X ) = P ( C 1 ) Π k = 1 n P ( x k | C 1 ) Σ h = 1 2 [ P ( C h ) Π k = 1 n P ( x k | C h ) ]
Mobile terminal obtains the probability P (C that pending note belongs to non-junk note 2| X), shown in specific as follows:
P(C 2|X)=1-P(C 1|X)
Mobile terminal obtains the classification results of pending note, wherein, as P (C 1| X) >P (C 2| X) time then pending note belong to refuse messages, otherwise pending note belongs to non-junk note.
Wherein, mobile terminal pending note carried out preliminary treatment with the step obtaining word feature corresponding to pending note and rule feature before also comprise: mobile terminal judges that sender's number of pending note is whether in the privately owned black and white lists corresponding with mobile terminal, wherein, when sender's number is in the privately owned blacklist corresponding with mobile terminal, then pending note belongs to refuse messages, and when sender's number is in the privately owned white list corresponding with mobile terminal, then pending note belongs to non-junk note; When sender's number is not in the privately owned black and white lists corresponding with mobile terminal, mobile terminal continues to judge sender's number whether in publicly-owned black and white lists, wherein, when sender's number is in publicly-owned blacklist, then pending note belongs to refuse messages, and when sender's number is in publicly-owned white list, then pending note belongs to non-junk note; When sender's number is not in publicly-owned black and white lists, mobile terminal performs and carries out preliminary treatment to obtain the step of word feature corresponding to pending note and rule feature to pending note.
Wherein, when classification results is judged as the classification results of mistake and mobile terminal receives the uploading instructions of the classification results of corresponding mistake, the classification error information that mobile terminal uploads to Cloud Server also comprises sender's number of pending note, and sender's number is uploaded to Cloud Server with in the privately owned black and white lists corresponding with mobile terminal judging whether sender's number to add Cloud Server and store and/or publicly-owned black and white lists by mobile terminal; When the privately owned black and white lists corresponding with mobile terminal that Cloud Server stores and/or publicly-owned black and white lists upgrade, mobile terminal obtains the publicly-owned black and white lists and/or privately owned black and white lists that the privately owned black and white lists lastest imformation of Cloud Server and/or publicly-owned black and white lists lastest imformation store with synchronized update mobile terminal.
Wherein, the classification results of mistake be by belong to refuse messages pending SMS classified be refuse messages for non-junk note or by belonging to the pending SMS classified of non-junk note; Word library updating information at least comprises word feature and the matching probability of rule feature in refuse messages or non-junk note, the ratio shared by refuse messages and the ratio shared by non-junk note that privately owned note training set upgrades rear pending note.
For solving the problems of the technologies described above, a second aspect of the present invention is: provide a kind of method for filtering spam short messages, comprise: Cloud Server learns to obtain the classified lexicon corresponding with mobile terminal to its privately owned note training set corresponding with mobile terminal stored and publicly-owned note training set, classified lexicon is used for mobile terminal and classifies to obtain classification results to pending note, wherein, classification results is refuse messages or non-junk note; When classification results is judged as the classification results of mistake and mobile terminal receives the uploading instructions of the classification results of corresponding mistake, the classification error information that cloud server mobile terminal is uploaded, wherein, classification error information comprises the classification results of pending note and mistake; Pending note adds in the privately owned note training set corresponding with mobile terminal to upgrade privately owned note training set by Cloud Server; After privately owned note training set and/or publicly-owned note training set upgrade, Cloud Server learns to obtain Word library updating information to privately owned note training set and publicly-owned note training set.
Wherein, the classification results of mistake be by belong to refuse messages pending SMS classified be refuse messages for non-junk note or by belonging to the pending SMS classified of non-junk note, when the classification results of mistake be by belong to refuse messages pending SMS classified for non-junk note time, after privately owned note training set upgrades, Cloud Server specifically comprises the step that privately owned note training set and publicly-owned note training set learn to obtain Word library updating information: Cloud Server carries out preliminary treatment to obtain word feature corresponding to pending note and rule feature to pending note, Cloud Server is according to the coupling frequency of word feature Sum fanction feature in refuse messages in publicly-owned note training set, the coupling frequency of word feature Sum fanction feature in refuse messages in privately owned training set, refuse messages quantity in privately owned note training set and publicly-owned note training set, non-junk note quantity obtains the first Word library updating information, wherein, first Word library updating information comprises word feature and the matching probability of rule feature in refuse messages that privately owned note training set upgrades rear pending note, ratio shared by refuse messages and the ratio shared by non-junk note, when the classification results of mistake be by belong to non-junk note pending SMS classified for refuse messages time, after privately owned note training set upgrades, Cloud Server specifically comprises the step that privately owned note training set and publicly-owned note training set learn to obtain Word library updating information: Cloud Server carries out preliminary treatment to obtain word feature corresponding to pending note and rule feature to pending note, Cloud Server is according to the coupling frequency of word feature Sum fanction feature in non-junk note in publicly-owned note training set, the coupling frequency of word feature Sum fanction feature in non-junk note in privately owned training set, refuse messages quantity in privately owned note training set and publicly-owned note training set, non-junk note quantity obtains the second Word library updating information, wherein, second Word library updating information comprises word feature and the matching probability of rule feature in non-junk note that privately owned note training set upgrades rear pending note, ratio shared by refuse messages and the ratio shared by non-junk note.
Wherein, classification error information also comprises sender's number of pending note, Cloud Server judges whether sender's number to add in the privately owned black and white lists corresponding with mobile terminal and/or publicly-owned black and white lists that Cloud Server stores, if then the Cloud Server renewal privately owned black and white lists corresponding with mobile terminal and/or publicly-owned black and white lists are to obtain privately owned black and white lists lastest imformation and/or publicly-owned black and white lists lastest imformation, upgrade the publicly-owned black and white lists of mobile terminal storage and/or privately owned black and white lists to make mobile terminal synchronization.
For solving the problems of the technologies described above, a third aspect of the present invention is: provide a kind of mobile terminal, comprise: sort module, classified lexicon for storing according to mobile terminal classifies to obtain classification results to pending note, wherein, classification results is refuse messages or non-junk note, and classified lexicon is that Cloud Server learns to obtain to its privately owned note training set corresponding with mobile terminal stored and publicly-owned note training set; Upper transmission module, for be judged as the classification results of mistake when classification results and mobile terminal receives the uploading instructions of the classification results of corresponding mistake time, classification error information is uploaded to Cloud Server to upgrade the privately owned note training set corresponding with mobile terminal, wherein, classification error information comprises the classification results of pending note and mistake; Mobile terminal to update module, for obtaining the Word library updating information of Cloud Server with the classified lexicon stored in synchronized update mobile terminal, wherein, Word library updating information learns for upgrading rear Cloud Server at privately owned note training set and/or publicly-owned note training set privately owned note training set and publicly-owned note training set and obtains.
For solving the problems of the technologies described above, a fourth aspect of the present invention is: provide a kind of Cloud Server, comprise: study module, learn to obtain the classified lexicon corresponding with mobile terminal for the privately owned note training set corresponding with mobile terminal that store Cloud Server and publicly-owned note training set, classified lexicon is used for mobile terminal and classifies to obtain classification results to pending note, wherein, classification results is refuse messages or non-junk note; Cloud Server update module, when classification results is judged as the classification results of mistake and mobile terminal receives the uploading instructions of the classification results of corresponding mistake, for the classification error information that mobile terminal receive is uploaded, wherein, classification error information comprises the classification results of pending note and mistake; Cloud Server update module is also for adding pending note in the privately owned note training set corresponding with mobile terminal to upgrade privately owned note training set; Study module is also for after privately owned note training set and/or publicly-owned note training set upgrade, learn to obtain Word library updating information to privately owned note training set and publicly-owned note training set, and then make mobile terminal upgrade the classified lexicon stored in mobile terminal according to Word library updating synchronizing information.
For solving the problems of the technologies described above, a fifth aspect of the present invention is: provide a kind of filtering junk short messages system, comprises foregoing mobile terminal and foregoing Cloud Server.
The invention has the beneficial effects as follows: the situation being different from prior art, the present invention classifies to obtain classification results to pending note according to its classified lexicon stored by mobile terminal, when classification results is judged as the classification results of mistake and mobile terminal receives the uploading instructions of the classification results of corresponding mistake, classification error information is uploaded to Cloud Server to upgrade the privately owned note training set corresponding with mobile terminal by mobile terminal, the classified lexicon that the Word library updating information that mobile terminal obtains Cloud Server stores with synchronized update mobile terminal, by the powerful disposal ability of Cloud Server, privately owned note training set after renewal and publicly-owned note training set are learnt again, for mobile terminal provides the classified lexicon having personalized and general general character concurrently, and then improve constantly the accuracy of mobile terminal to filtering junk short messages, improve mobile terminal to the filter efficiency of refuse messages, the filtration of refuse messages is made to have personalization.
Accompanying drawing explanation
Fig. 1 is the flow chart of method for filtering spam short messages first execution mode of the present invention;
Fig. 2 is the flow chart that in method for filtering spam short messages first execution mode of the present invention, mobile terminal classifies to obtain classification results to pending note according to its classified lexicon stored;
Fig. 3 is the flow chart of method for filtering spam short messages second execution mode of the present invention;
Fig. 4 is when the classification results of mistake is pending SMS classifiedly learn to obtain the flow chart of Word library updating information to privately owned note training set and publicly-owned note training set for Cloud Server during non-junk note by what belong to refuse messages in method for filtering spam short messages second execution mode of the present invention;
Fig. 5 is when the classification results of mistake is pending SMS classifiedly learn to obtain the flow chart of Word library updating information to privately owned note training set and publicly-owned note training set for Cloud Server during refuse messages by what belong to non-junk note in method for filtering spam short messages second execution mode of the present invention;
Fig. 6 is the theory diagram of mobile terminal one execution mode of the present invention;
Fig. 7 is the theory diagram of Cloud Server one execution mode of the present invention;
Fig. 8 is the theory diagram of filtering junk short messages system one execution mode of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in embodiment of the present invention, be clearly and completely described the technical scheme in embodiment of the present invention, obviously, described execution mode is only the present invention's part execution mode, instead of whole execution modes.Based on the execution mode in the present invention, those of ordinary skill in the art, not making the every other execution mode obtained under creative work prerequisite, all belong to the scope of protection of the invention.
Refer to Fig. 1, method for filtering spam short messages first execution mode of the present invention comprises:
Step S101: classify to obtain classification results to pending note;
Mobile terminal classifies to obtain classification results corresponding to pending note to pending note according to its classified lexicon stored, and wherein, classification results is refuse messages or non-junk note.The classified lexicon that mobile terminal stores and the classified lexicon that Cloud Server stores keep synchronized update at any time, and the classified lexicon that Cloud Server stores is that Cloud Server learns to obtain to its privately owned note training set corresponding with mobile terminal stored and publicly-owned note training set.The privately owned note training set corresponding with mobile terminal can be sky or stores the classified refuse messages and/or non-junk note that mobile terminal uploads, when the privately owned note training set corresponding with mobile terminal is empty, classified lexicon is Cloud Server to publicly-owned note training set and for the privately owned note training set of sky learns and obtain, and namely now only learns publicly-owned note training set; When the privately owned note training set corresponding with mobile terminal is not empty, classified lexicon is that the Cloud Server pair privately owned note training set corresponding with mobile terminal and publicly-owned note training set learn to obtain.Cloud Server stores a publicly-owned note training set and multiple privately owned note training set corresponding with mobile terminal, namely Cloud Server stores a total note training set and multiple privately owned note training set, wherein, each privately owned note training set corresponds to a mobile terminal.
Wherein, store classified refuse messages and the non-junk note of some in publicly-owned note training set, all mobile terminals on Cloud Server share a publicly-owned note training set; And privately owned note training set stores classified refuse messages and non-junk note that mobile terminal uploads, the privately owned note training set that different mobile terminal is corresponding different.
Step S102: classification error information is uploaded to Cloud Server to upgrade the privately owned note training set corresponding with mobile terminal;
After mobile terminal obtains the classification results of pending note, user judges whether the classification results that mobile terminal obtains is wrong classification results, wherein, the classification results of mistake be by belong to refuse messages pending SMS classified be refuse messages for non-junk note or by belonging to the pending SMS classified of non-junk note.Some note may be refuse messages for some users, but may be then non-refuse messages for other users, and therefore different user may have different judged results for the correctness of the classification results of the pending note of same.
When user judge classification results as the classification results of mistake and mobile terminal receives the uploading instructions of the classification results of corresponding above-mentioned mistake time, classification error information is uploaded to Cloud Server to upgrade the privately owned note training set corresponding with mobile terminal according to uploading instructions by mobile terminal, namely classification error information uploads to and server by mobile terminal after receiving uploading instructions, so that the Cloud Server pair privately owned note training set corresponding with this mobile terminal upgrades, wherein, classification error information comprises the classification results of the mistake of pending note and correspondence.
Step S103: the classified lexicon that the Word library updating information obtaining Cloud Server stores with synchronized update mobile terminal.
The classified lexicon that the Word library updating information that mobile terminal obtains Cloud Server stores with synchronized update mobile terminal, wherein, Word library updating information is that Cloud Server learns to obtain to privately owned note training set and publicly-owned note training set after the privately owned note training set corresponding with mobile terminal and/or the renewal of publicly-owned note training set of Cloud Server storage, namely Word library updating information is obtain when at least one in privately owned note training set and publicly-owned note training set upgrades, also namely Word library updating information is obtained when there is the renewal of a kind of situation in following three kinds of situations: (1) publicly-owned note training set upgrades, (2) privately owned note training set upgrades, (3) privately owned note training set and publicly-owned note training set upgrade simultaneously.Cloud Server can regularly add in the classified refuse messages of some and/or non-junk note to publicly-owned note training set to upgrade publicly-owned note training set.Wherein, when the classified note not having memory mobile terminal to upload during privately owned note training set is sky and privately owned note training set and when not upgrading privately owned note training set, Word library updating information is Cloud Server to the publicly-owned note training set after upgrading and for the privately owned note training set of sky learns and obtain and namely now only learn the publicly-owned note training set after upgrading; When privately owned note training set is not empty, Word library updating information learns for upgrading rear Cloud Server at privately owned note training set and/or publicly-owned note training set privately owned note training set and publicly-owned note training set and obtains.After Cloud Server obtains Word library updating information by learning, mobile terminal downloads Word library updating information by modes such as GPRS, WiFi from Cloud Server, the Word library updating information that mobile terminal only needs Download Info capacity less and without the need to downloading the renewal of classified lexicon that the whole classified lexicon after upgrading in Cloud Server can realize storing mobile terminal, reduce the flow needed for mobile terminal to update classified lexicon.Mobile terminal is classified to follow-up pending note according to the classified lexicon after renewal, thus forms a cyclic process.
The present invention is by the powerful disposal ability of Cloud Server, the privately owned note training set that the classification error information updating of uploading according to mobile terminal is corresponding with mobile terminal, after privately owned note training set and/or publicly-owned note training set upgrade, learn again in conjunction with participle dictionary and inactive dictionary, by learning further to have general character and personalized classified lexicon concurrently for mobile terminal provides, and then improve constantly mobile terminal to the processing speed of filtering junk short messages and accuracy, improve mobile terminal to the filter efficiency of refuse messages, simultaneously also for mobile terminal provides personalized filtering junk short messages, the filtration of refuse messages is made to have personalization, meet the different filtration needs of different user to note.
Refer to Fig. 2, in method for filtering spam short messages first execution mode of the present invention, mobile terminal is classified to obtain classification results according to its classified lexicon stored to pending note and is specifically comprised following sub-step:
Sub-step S1011: preliminary treatment is carried out to obtain word feature corresponding to pending note and rule feature to pending note;
Mobile terminal carries out preliminary treatment to obtain word feature corresponding to pending note and rule feature to pending note, specifically comprises:
Mobile terminal carries out participle to pending note, pending note is divided into significant word feature one by one by the participle dictionary stored by inquiring about it, wherein, Chinese word segmentation Chinese short message text segmentation is become Chinese minimum, can independent activities, significant language element and entry; For English short message text, according to the such as space of the separation mark between word, English short message text is separated into word feature one by one.The segmenting method of present embodiment is Word Intelligent Segmentation method, namely utilizes HMM (Hidden Markov Model, HMM) algorithm.In other embodiments, point method such as morphology, rule-based point of morphology of Dictionary based segment method, cutting labelling method, Corpus--based Method also can be utilized to carry out participle, do not make too many restrictions herein.
Mobile terminal is deleted according to the inactive dictionary that it stores does not have contributive word feature to SMS classified, it single word, interjection, auxiliary words of mood, pronoun etc. of being formed after comprising participle.
After deletion does not have contributive word feature, mobile terminal chooses the word feature higher to SMS classified contribution further from remaining word feature, whether occurs the mutual information MI (A with certain classification C by calculating each remaining word feature A; C), wherein, the C that classifies comprises refuse messages C1 and non-junk note C2 two class, mutual information MI (A; C) shown in computing formula is specific as follows:
MI ( A ; C ) = Σ x ∈ { 0,1 } , c ∈ { c 1 , c 2 } P ( A = x , C = c ) log P ( A = x , C = c ) P ( A = x ) P ( C = c )
Therefrom choose further and there is the highest mutual information MI (A; C) word feature judges word feature used as classification.
Mobile terminal obtains the rule feature of pending note, and rule feature comprises note length, whether comprise URL, whether comprise telephone number and whether short message sending person number is phone number.
Pending note X is expressed as: X={x 1, x 2..., x n, x k(k=1,2 ..., n) be word feature corresponding to pending note and rule feature.
Sub-step S1012: the ratio shared by refuse messages, non-junk note, word feature and the matching probability of rule feature in refuse messages and the matching probability in non-junk note are substituted in Bayes's classification formula;
Mobile terminal is by the ratio P (C shared by the refuse messages that stores in classified lexicon 1), ratio P (C shared by non-junk note 2), word feature corresponding to pending note and rule feature x kmatching probability P (x in refuse messages k| C 1) and non-junk note in matching probability P (x k| C 2) substitute in Bayes's classification formula, to obtain the probability P (C that pending note belongs to refuse messages 1| X), shown in Bayes's classification formula is specific as follows:
P ( C 1 | X ) = P ( C 1 ) Π k = 1 n P ( x k | C 1 ) Σ h = 1 2 [ P ( C h ) Π k = 1 n P ( x k | C h ) ]
Wherein, the ratio P (C shared by refuse messages 1) refuse messages quantity accounts for the ratio of all notes (i.e. refuse messages and non-junk note) quantity in namely corresponding with mobile terminal privately owned note training set and publicly-owned note training set; Ratio P (C shared by non-junk note 2) non-junk note quantity accounts for the ratio of all note quantity in namely corresponding with mobile terminal privately owned note training set and publicly-owned note training set.Ratio P (the C shared by refuse messages is stored in the classified lexicon corresponding with mobile terminal 1), ratio P (C shared by non-junk note 2), word feature and the matching probability P (x of rule feature in refuse messages k| C 1) and non-junk note in matching probability P (x k| C 2), the classified lexicon that different mobile terminal is corresponding different.
Sub-step S1013: obtain the probability that pending note belongs to non-junk note;
Mobile terminal obtains the probability P (C that pending note belongs to non-junk note further 2| X), shown in specific as follows:
P(C 2|X)=1-P(C 1|X)
In other embodiments, the probability that Bayes's classification formula also can be utilized to obtain pending note belong to non-junk note, does not make too many restrictions herein.
Sub-step S1014: the classification results obtaining pending note.
Mobile terminal belongs to the probability P (C of refuse messages according to pending note 1| X) and belong to the probability P (C of non-junk note 2| X) obtain the classification results of pending note, wherein, as P (C 1| X) >P (C 2| X) time then pending note classification results for belonging to refuse messages, otherwise the classification results of pending note is for belonging to non-junk note.Meanwhile, also by judging P (C 1| X) whether be greater than 0.5 and carry out classification judge, as P (C 1| X) then belong to refuse messages when being greater than 0.5, otherwise belong to non-junk note.
When classification results is judged as the classification results of mistake and mobile terminal receives the uploading instructions of the classification results of corresponding mistake, the privately owned note training set corresponding with mobile terminal that then at least Cloud Server stores is upgraded, and the corresponding Word library updating information obtained at least comprises privately owned note training set and upgrades word feature corresponding to rear pending note and the matching probability of rule feature in refuse messages or non-junk note, the ratio P (C shared by refuse messages 1) and ratio P (C shared by non-junk note 2), particularly, when the classification results of mistake is for upgrading word feature and the matching probability of rule feature in refuse messages, the ratio P (C shared by refuse messages by belonging to the pending SMS classified of refuse messages for corresponding during non-junk note 1) and ratio P (C shared by non-junk note 2); When the classification results of mistake is for upgrading word feature and the matching probability of rule feature in non-junk note, the ratio P (C shared by refuse messages by belonging to the pending SMS classified of non-junk note for corresponding during refuse messages 1) and ratio P (C shared by non-junk note 2).
In addition, mobile terminal pending note carried out preliminary treatment with the step obtaining word feature corresponding to pending note and rule feature before also comprise:
Mobile terminal judges that sender's number of pending note is whether in the privately owned black and white lists corresponding with mobile terminal, wherein, when sender's number is in the privately owned blacklist corresponding with mobile terminal, then pending note belongs to refuse messages, and when sender's number is in the privately owned white list corresponding with mobile terminal, then pending note belongs to non-junk note.
When sender's number is not in the privately owned black and white lists corresponding with mobile terminal, mobile terminal continues to judge sender's number whether in publicly-owned black and white lists, wherein, when sender's number is in publicly-owned blacklist, then pending note belongs to refuse messages, and when sender's number is in publicly-owned white list, then pending note belongs to non-junk note.
When sender's number is not in publicly-owned black and white lists, mobile terminal execution is above-mentioned carries out preliminary treatment to obtain step and the sub-step S1011 of word feature corresponding to pending note and rule feature to pending note.
After mobile terminal execution above-mentioned steps S101 obtains the classification results of pending note, when classification results is judged as the classification results of mistake and mobile terminal receives the uploading instructions of the classification results of corresponding mistake, the classification error information that mobile terminal uploads to Cloud Server also comprises sender's number of pending note, sender's number is uploaded to Cloud Server with in the privately owned black and white lists corresponding with mobile terminal judging whether sender's number to add Cloud Server and store and/or publicly-owned black and white lists by mobile terminal, if then Cloud Server upgrades the privately owned black and white lists corresponding with mobile terminal of Cloud Server storage and/or publicly-owned black and white lists.Particularly, after sender's number of the pending note that mobile terminal is uploaded, first sender's number adds in the privately owned black and white lists corresponding with mobile terminal by Cloud Server, after this sender's number reaches some, then add publicly-owned black and white lists.Such as, when more than a predetermined number as 10,000 users report that this sender's number then adds in publicly-owned blacklist by sender's number; When exceed another predetermined number as 100 users report sender's number simultaneously this short message content obviously containing then this sender's number being added in publicly-owned blacklist of illicit content.
When the privately owned black and white lists corresponding with mobile terminal that Cloud Server stores and/or publicly-owned black and white lists upgrade, mobile terminal obtains by modes such as GPRS, WiFi the publicly-owned black and white lists and/or privately owned black and white lists that the privately owned black and white lists lastest imformation of Cloud Server and/or publicly-owned black and white lists lastest imformation store with synchronized update mobile terminal.Further, mobile terminal utilizes the publicly-owned black and white lists after upgrading and/or privately owned black and white lists to judge follow-up pending note.Such as, correctly judge to obtain pending note belong to refuse messages maybe by belong to refuse messages pending SMS classified as non-junk note after, upload in Cloud Server by sender's number corresponding for pending note, this sender's number adds in the privately owned black and white lists corresponding with mobile terminal by Cloud Server further.
Be appreciated that, the classified lexicon that method for filtering spam short messages first execution mode of the present invention is learnt by mobile terminal according to the Cloud Server pair privately owned note training set corresponding with mobile terminal and publicly-owned note training set and obtains is classified, when classification results mistake, mobile terminal is uploaded classification error information and is moved the classified lexicon corresponding with dynamic terminal to upgrade in time, mobile terminal just can be classified without the need to the study carrying out note sample, and then the filter efficiency of mobile terminal to refuse messages can be improved, and the corresponding different privately owned note training set of different mobile terminal and classified lexicon, the filtration of refuse messages is made to have personalization, and improve the filtering accuracy of refuse messages.
In addition, the present invention had both utilized participle dictionary and inactive dictionary to obtain the word feature of note, also obtain note length, whether comprise URL, whether comprise the rule feature whether telephone number and short message sending person number be phone number etc., by the matching probability of word feature Sum fanction feature is substituted into Bayes's classification formula, more accurately directly calculate the probability that pending note belongs to refuse messages, and judge rapidly, calculate simple and quick efficiency high, greatly reduce the work for the treatment of amount of mobile terminal.
Refer to Fig. 3, method for filtering spam short messages second execution mode of the present invention comprises:
Step S201: privately owned note training set and publicly-owned note training set are learnt;
Cloud Server learns to obtain the classified lexicon corresponding with mobile terminal to its privately owned note training set corresponding with mobile terminal stored and publicly-owned note training set.The privately owned note training set corresponding with mobile terminal can be sky or stores the classified refuse messages and/or non-junk note that mobile terminal uploads, when the privately owned note training set corresponding with mobile terminal is empty, Cloud Server to publicly-owned note training set and for empty privately owned note training set learn to obtain classified lexicon namely now Cloud Server only publicly-owned note training set is learnt; When the privately owned note training set corresponding with mobile terminal is not empty, the Cloud Server pair privately owned note training set corresponding with mobile terminal and publicly-owned note training set learn to obtain classified lexicon.The Cloud Server pair privately owned note training set corresponding with mobile terminal and publicly-owned note training set learn specifically to comprise: the participle dictionary that Cloud Server stores according to it, the inactive dictionary pair privately owned note training set corresponding with mobile terminal and publicly-owned note training set carry out preliminary treatment to obtain publicly-owned note training set, each refuse messages and word feature corresponding to non-junk note and rule feature in privately owned note training set, each word feature and rule feature is obtained at refuse messages further according to refuse messages quantity and non-junk note quantity, matching probability in non-junk note, ratio shared by refuse messages and the ratio shared by non-junk note.
When the participle dictionary that Cloud Server stores and/or inactive Word library updating, the participle dictionary that mobile terminal stores and/or inactive dictionary and Cloud Server keep synchronized update.Classified lexicon is used for mobile terminal and classifies to obtain classification results to pending note, and wherein, classification results is refuse messages or non-junk note.The corresponding different mobile terminal of Cloud Server stores the classified lexicon corresponding respectively with each mobile terminal.Classified refuse messages and the non-junk note of some is stored in publicly-owned note training set.
Before first pending note being carried out to classification and judging, mobile terminal can judge to obtain in the privately owned note training set corresponding with mobile terminal that the refuse messages of some and non-junk note store to Cloud Server by upload user; In addition, the privately owned note training set that the initial time of filtering junk short messages is corresponding with mobile terminal also can be sky.Before first pending note being carried out to classification and judging, mobile terminal is obtained the classified lexicon corresponding with mobile terminal stored in cloud service and judges to carry out classification by modes such as GPRS, WiFi.
Step S202: the classification error information that mobile terminal receive is uploaded;
When user judges to obtain classification results that classification results that mobile terminal obtains is mistake and mobile terminal receives the uploading instructions of the classification results of corresponding mistake, the classification error information that cloud server mobile terminal is uploaded, classification error information comprises the classification results of pending note and mistake, the classification results of mistake be by belong to refuse messages pending SMS classified be refuse messages for non-junk note or by belonging to the pending SMS classified of non-junk note.
Step S203: pending note is added in privately owned note training set;
Pending note in classification error information adds in the privately owned note training set corresponding with mobile terminal to upgrade the privately owned note training set corresponding with mobile terminal by Cloud Server.When the classification results of the mistake that mobile terminal is uploaded be by belong to refuse messages pending SMS classified for non-junk note time, pending note adds in the refuse messages class in privately owned note training set by Cloud Server; When the classification results of the mistake that mobile terminal is uploaded be by belong to non-junk note pending SMS classified for refuse messages time, pending note adds in the non-junk note class in privately owned note training set by Cloud Server.
Step S204: privately owned note training set and publicly-owned note training set are learnt.
After privately owned note training set and/or publicly-owned note training set upgrade, Cloud Server learns to obtain Word library updating information to privately owned note training set and publicly-owned note training set, the acquisition of Word library updating information specifically comprises following two kinds of situations: (1) when the classified note not having memory mobile terminal to upload during privately owned note training set is sky and privately owned note training set and when not upgrading privately owned note training set, and to be Cloud Server learn to obtain to the publicly-owned note training set after upgrading Word library updating information; (2) when privately owned note training set is not empty, Word library updating information learns for upgrading rear Cloud Server at privately owned note training set and/or publicly-owned note training set publicly-owned note training set and privately owned note training set and obtains.Mobile terminal upgrades the classified lexicon stored in mobile terminal according to Word library updating synchronizing information, classified lexicon now in Cloud Server is upgraded according to Word library updating information equally, wherein, Word library updating information can be stored in the classified lexicon corresponding with mobile terminal on Cloud Server.Before pending note being carried out to classification and judging, the matching probability of each word feature Sum fanction feature in refuse messages and non-junk note in the classified lexicon that mobile terminal stores keeps synchronous with the classified lexicon corresponding with mobile terminal that Cloud Server stores.
Refer to Fig. 4, when the classification results of mistake be by belong to refuse messages pending SMS classified for non-junk note time, in method for filtering spam short messages second execution mode of the present invention, after privately owned note training set upgrades, Cloud Server learns to obtain Word library updating information to privately owned note training set and publicly-owned note training set and specifically comprises following sub-step:
Sub-step S2041a: preliminary treatment is carried out to obtain word feature corresponding to pending note and rule feature to pending note;
Cloud Server carries out preliminary treatment to obtain word feature corresponding to pending note and rule feature to pending note, and pending note X is expressed as: X={x 1, x 2..., x n, x k(k=1,2 ..., n) be word feature corresponding to pending note and rule feature.
Sub-step S2042a: obtain the first Word library updating information according to the coupling frequency of word feature Sum fanction feature, refuse messages quantity, non-junk note quantity.
The word feature Sum fanction feature x that Cloud Server is corresponding according to pending note in publicly-owned note training set kthe word feature Sum fanction feature x that in the coupling frequency in refuse messages, privately owned training set, pending note is corresponding krefuse messages quantity in the coupling frequency in refuse messages, privately owned note training set and publicly-owned note training set, non-junk note quantity obtain the first Word library updating information, wherein, the first Word library updating information is included in word feature corresponding to the privately owned note training set renewal rear pending note corresponding with mobile terminal and rule feature x kmatching probability in refuse messages, the ratio shared by refuse messages and the ratio shared by non-junk note.Mobile terminal upgrades the word feature and rule feature x that store in the classified lexicon classified lexicon that also namely amendment is corresponding with mobile terminal according to the first Word library updating synchronizing information kmatching probability in refuse messages, the matching probability of the word feature do not comprised in the classified lexicon corresponding with mobile terminal in refuse messages to be joined in classified lexicon, and the ratio shared by refuse messages stored in amendment classified lexicon and the ratio shared by non-junk note.Word feature and rule feature x kmatching probability in refuse messages equals word feature Sum fanction feature x corresponding to pending note in publicly-owned note training set kthe word feature Sum fanction feature x that in the coupling frequency in refuse messages+privately owned training set, pending note is corresponding kthe coupling frequency in refuse messages and divided by the refuse messages quantity in privately owned note training set and publicly-owned note training set.
Refer to Fig. 5, when the classification results of mistake be by belong to non-junk note pending SMS classified for refuse messages time, in method for filtering spam short messages second execution mode of the present invention, after privately owned note training set upgrades, Cloud Server learns to obtain Word library updating information to privately owned note training set and publicly-owned note training set and specifically comprises following sub-step:
Sub-step S2041b: preliminary treatment is carried out to obtain word feature corresponding to pending note and rule feature to pending note;
Cloud Server carries out preliminary treatment to obtain word feature corresponding to pending note and rule feature to pending note.
Sub-step S2042b: obtain the second Word library updating information according to the coupling frequency of word feature, rule feature, refuse messages quantity, non-junk note quantity.
The word feature Sum fanction feature x that Cloud Server is corresponding according to pending note in publicly-owned note training set kthe word feature Sum fanction feature x that in the coupling frequency in non-junk note, privately owned training set, pending note is corresponding krefuse messages quantity in the coupling frequency in non-junk note, privately owned note training set and publicly-owned note training set, non-junk note quantity obtain the second Word library updating information, wherein, the second Word library updating information is included in word feature corresponding to the privately owned note training set renewal rear pending note corresponding with mobile terminal and rule feature x kmatching probability in non-junk note, the ratio shared by refuse messages and the ratio shared by non-junk note.Mobile terminal upgrades the word feature and rule feature x that store in the classified lexicon classified lexicon that also namely amendment is corresponding with mobile terminal according to the second Word library updating synchronizing information kmatching probability in non-junk note, the matching probability of the word feature do not comprised in the classified lexicon corresponding with mobile terminal in non-junk note to be joined in classified lexicon, and the ratio shared by refuse messages stored in amendment classified lexicon and the ratio shared by non-junk note.Word feature and rule feature x kmatching probability in non-junk note equals word feature Sum fanction feature x corresponding to pending note in publicly-owned note training set kthe word feature Sum fanction feature x that in the coupling frequency in non-junk note+privately owned training set, pending note is corresponding kthe coupling frequency in non-junk note and divided by the non-junk note quantity in privately owned note training set and publicly-owned note training set.
When publicly-owned note training set upgrades, the renewal of publicly-owned note training set comprises to be increased refuse messages or increases non-junk note or increase refuse messages and non-junk note simultaneously, with the renewal of above-mentioned privately owned note training set and study in like manner, preliminary treatment is carried out to the note of renewal part in publicly-owned note training set, further according to word feature, the coupling frequency of rule feature, refuse messages quantity, non-junk note quantity obtains corresponding Word library updating information, to upgrade word feature, the matching probability of rule feature in refuse messages and/or non-junk note, ratio shared by refuse messages and the ratio shared by non-junk note.When privately owned note training set and publicly-owned note training set upgrade simultaneously, also with the renewal of above-mentioned publicly-owned note training set, privately owned note training set and study in like manner, repeat no more herein.
The Cloud Server pair privately owned note training set corresponding with mobile terminal and publicly-owned note training set learn namely to obtain word feature, the matching probability of rule feature in refuse messages and non-junk note, the ratio shared by refuse messages and the ratio shared by non-junk note according to word feature, the coupling frequency of rule feature in refuse messages and non-junk note, refuse messages quantity and non-junk note quantity, the matching probability obtained, shared ratio are stored in classified lexicon, the classified lexicon that different mobile terminal is corresponding different.When privately owned note training set and/or publicly-owned note training set upgrade, Cloud Server only need carry out preliminary treatment to the note of renewal part, namely word feature corresponding to each note upgraded in front privately owned note training set and/or publicly-owned note training set and rule feature is retained, the efficiency of Cloud Server preliminary treatment and study can be improved, and then improve the efficiency upgrading classified lexicon.
In addition, sender's number of pending note is also comprised in the classification error information that cloud server mobile terminal is uploaded, after receiving sender's number, Cloud Server judges whether sender's number to add in the privately owned black and white lists corresponding with mobile terminal and/or publicly-owned black and white lists that Cloud Server stores, if then the Cloud Server renewal privately owned black and white lists corresponding with mobile terminal and/or publicly-owned black and white lists are to obtain privately owned black and white lists lastest imformation and/or publicly-owned black and white lists lastest imformation, the publicly-owned black and white lists of mobile terminal storage and/or privately owned black and white lists is upgraded to make mobile terminal synchronization.Publicly-owned black and white lists lastest imformation, privately owned black and white lists lastest imformation comprise sender's number and the corresponding list added of this sender's number.Such as, when more than a predetermined number as 10,000 users report that this sender's number then adds in publicly-owned blacklist by sender's number; When exceed another predetermined number as 100 users report sender's number simultaneously this short message content obviously containing then this sender's number being added in publicly-owned blacklist of illicit content.Again such as, correctly judge to obtain pending note belong to refuse messages maybe by belong to refuse messages pending SMS classified as non-junk note after, upload in Cloud Server by sender's number corresponding for pending note, this sender's number adds in the privately owned black and white lists corresponding with mobile terminal by Cloud Server further.
Be appreciated that, method for filtering spam short messages second execution mode of the present invention learns to obtain the classified lexicon corresponding with mobile terminal to its privately owned note training set corresponding with mobile terminal stored and publicly-owned note training set by Cloud Server, mobile terminal carries out classification according to classified lexicon to pending note and judges, after receiving the classification error information that mobile terminal uploads, Cloud Server carries out learning and obtains Word library updating information, and then make mobile terminal synchronization upgrade the classified lexicon of mobile terminal storage, Cloud Server storage takes up room larger publicly-owned note training set, privately owned note training set and perform the larger learning process of amount of calculation, mobile terminal can be improved to the filter efficiency of refuse messages and reduce the taking up room of mobile terminal, and the corresponding different mobile terminal of Cloud Server stores corresponding privately owned note training set and classified lexicon, the filtration of refuse messages is made to have personalization, and then improve the filtering accuracy of refuse messages.
Refer to Fig. 6, mobile terminal one execution mode of the present invention comprises:
Sort module 301, for classifying to obtain classification results to pending note according to the classified lexicon stored in mobile terminal, and classifies to follow-up pending note according to the classified lexicon after upgrading; Specific implementation with reference to implementation procedure corresponding to abovementioned steps S101, no longer can repeat at this.
Upper transmission module 302, for be judged as the classification results of mistake when the classification results that sort module 301 obtains and mobile terminal receives the uploading instructions of the classification results of corresponding mistake time, classification error information is uploaded to Cloud Server to upgrade the privately owned note training set corresponding with mobile terminal; Specific implementation with reference to implementation procedure corresponding to abovementioned steps S102, no longer can repeat at this.
Mobile terminal to update module 303, for obtaining the Word library updating information of Cloud Server with the classified lexicon stored in synchronized update mobile terminal, and obtain the publicly-owned black and white lists and/or privately owned black and white lists that the privately owned black and white lists lastest imformation of Cloud Server and/or publicly-owned black and white lists lastest imformation store with synchronized update mobile terminal; Specific implementation with reference to implementation procedure corresponding to abovementioned steps S103, no longer can repeat at this.
Refer to Fig. 7, Cloud Server one execution mode of the present invention comprises:
Study module 401, learn to obtain the classified lexicon corresponding with mobile terminal for the privately owned note training set corresponding with mobile terminal that store Cloud Server and publicly-owned note training set, also for after privately owned note training set and/or publicly-owned note training set upgrade, learn to obtain Word library updating information to privately owned note training set and publicly-owned note training set, and then make mobile terminal upgrade the classified lexicon stored in mobile terminal according to Word library updating synchronizing information; Specific implementation with reference to implementation procedure corresponding to abovementioned steps S201, no longer can repeat at this.
Cloud Server update module 402, for be judged as the classification results of mistake when classification results and mobile terminal receives the uploading instructions of the classification results of corresponding mistake time, the classification error information that mobile terminal receive is uploaded, and the pending note in classification error information is added in privately owned note training set corresponding to mobile terminal to upgrade privately owned note training set; Also sender's number is added in its privately owned black and white lists corresponding with mobile terminal stored and/or publicly-owned black and white lists for judging whether, if then the Cloud Server update module 402 renewal privately owned black and white lists corresponding with mobile terminal and/or publicly-owned black and white lists are to obtain privately owned black and white lists lastest imformation and/or publicly-owned black and white lists lastest imformation; Specific implementation with reference to implementation procedure corresponding to abovementioned steps S202, no longer can repeat at this.
Refer to Fig. 8, filtering short message system one execution mode of the present invention comprises mobile terminal and server:
Mobile terminal comprises: privately owned black and white lists, publicly-owned black and white lists, classified lexicon, participle dictionary, stop using dictionary, privately owned black and white lists filtering module 501, publicly-owned black and white lists filtering module 502, sort module 503, upper transmission module 504 and mobile terminal to update module 505, wherein, privately owned black and white lists, publicly-owned black and white lists, classified lexicon, participle dictionary and inactive dictionary all keep synchronized update by mobile terminal to update module 505 and Cloud Server.
Privately owned black and white lists filtering module 501 and publicly-owned black and white lists filtering module 502, for carrying out the filtration of black and white lists by privately owned black and white lists and publicly-owned black and white lists to pending note, realize the preliminary fast filtering of refuse messages; Specific implementation with reference to implementation procedure corresponding to aforementioned black and white lists filtration step, no longer can repeat at this.
Sort module 503 is for when pending note is not in publicly-owned, privately owned black and white lists, first treat process note according to participle dictionary and inactive dictionary and carry out preliminary treatment acquisition word feature Sum fanction feature, secondly classify to obtain classification results to pending note according to the classified lexicon stored in mobile terminal; Specific implementation with reference to implementation procedure corresponding to abovementioned steps S101, no longer can repeat at this.
Upper transmission module 504, for when the classification results of above-mentioned sort module 503 be the classification results of mistake and mobile terminal receives the uploading instructions of the classification results of corresponding mistake time, classification error information is uploaded to Cloud Server to upgrade privately owned note training set corresponding to mobile terminal and privately owned black and white lists; Specific implementation with reference to implementation procedure corresponding to abovementioned steps S102, no longer can repeat at this.
Mobile terminal to update module 505 is for the publicly-owned black and white lists that obtains the publicly-owned black and white lists lastest imformation of Cloud Server and/or privately owned black and white lists lastest imformation and store with synchronized update mobile terminal and/or privately owned black and white lists; Also for obtaining the Word library updating information of Cloud Server with the classified lexicon stored in synchronized update mobile terminal; Also for obtaining the participle dictionary and/or inactive dictionary that the participle Word library updating information of Cloud Server and/or inactive dictionary lastest imformation store with synchronized update mobile terminal; Specific implementation with reference to implementation procedure corresponding to abovementioned steps S103, no longer can repeat at this.
Cloud Server comprises: participle dictionary, stop using dictionary, publicly-owned note training set, privately owned note training set, publicly-owned black and white lists, privately owned black and white lists, classified lexicon, study module 506 and Cloud Server update module 507.Wherein, participle dictionary, stop using dictionary, publicly-owned note training set and publicly-owned black and white lists are all that in rubbish filtering system, all mobile terminals share, privately owned note training set, privately owned black and white lists, classified lexicon are then difference each mobile terminals corresponding, and each mobile terminal is all different.
Study module 506, the publicly-owned note training set stored Cloud Server for the participle dictionary that stores according to Cloud Server and inactive dictionary and/or privately owned note training set corresponding to mobile terminal learn to obtain the classified lexicon corresponding with mobile terminal; Also for after publicly-owned note training set and/or privately owned note training set upgrade, learn to obtain Word library updating information to publicly-owned note training set and/or privately owned note training set, and then make mobile terminal upgrade the classified lexicon stored in mobile terminal according to Word library updating synchronizing information; Specific implementation with reference to implementation procedure corresponding to abovementioned steps S201, no longer can repeat at this.
Cloud Server update module 507, for the classification error information that mobile terminal receive is uploaded; Also for pending note being added to upgrade privately owned note training set in privately owned note training set corresponding to mobile terminal, and for upgrading publicly-owned black and white lists and/or privately owned black and white lists to obtain publicly-owned black and white lists lastest imformation and/or privately owned black and white lists lastest imformation; Specific implementation with reference to implementation procedure corresponding to abovementioned steps S202, no longer can repeat at this.
Publicly-owned note training set is for storing classified refuse messages and the non-junk note of some, the coupling frequency of word feature Sum fanction feature in the refuse messages of publicly-owned note training set that in the publicly-owned note training set that Cloud Server obtains, note is corresponding, refuse messages quantity in publicly-owned note training set, the coupling frequency of word feature Sum fanction feature in the non-junk note of publicly-owned note training set, in publicly-owned note training set, non-junk note quantity can be stored in publicly-owned note training set, also other memory location such as study module 506 grade of Cloud Server can be stored in.Privately owned note training set is used for the classified refuse messages uploaded of memory mobile terminal and non-junk note, the information such as the coupling frequency of word feature Sum fanction feature in privately owned note training set that in the privately owned note training set of in like manner Cloud Server acquisition, note is corresponding can be stored in privately owned note training set, also can be stored in other memory location such as study module 506 grade of Cloud Server.Classified lexicon is for the matching probability of word feature Sum fanction feature in refuse messages that store the Cloud Server pair privately owned note training set corresponding with mobile terminal and publicly-owned note training set and learn and obtain and the ratio shared by the matching probability in non-junk note, the ratio shared by refuse messages and non-junk note.Participle dictionary is for storing each significant word feature corresponding to note.Dictionary of stopping using is used for storage and does not have contributive word feature to SMS classified, it single word, interjection, auxiliary words of mood, pronoun etc. of being formed after comprising participle.Publicly-owned black and white lists generally adds the refuse messages sender number of blacklist for storing user and adds the non-junk short message sending person number of white list.Privately owned black and white lists is used for the refuse messages sender number that add blacklist corresponding with mobile terminal and adds the non-junk short message sending person number of white list.
Filtering short message system of the present invention is distributed frame, the classification that mobile terminal performs note judges, utilize the learning process of disposal ability comparatively by force and needed for processing speed Cloud Server execution faster classification judgement, the filter efficiency of refuse messages can be improved, make the filtration of refuse messages have personalization.
The foregoing is only embodiments of the present invention; not thereby the scope of the claims of the present invention is limited; every utilize specification of the present invention and accompanying drawing content to do equivalent structure or equivalent flow process conversion; or be directly or indirectly used in other relevant technical fields, be all in like manner included in scope of patent protection of the present invention.

Claims (11)

1. a method for filtering spam short messages, is characterized in that, comprising:
Mobile terminal classifies to obtain classification results to pending note according to its classified lexicon stored, and wherein, described classification results is refuse messages or non-junk note;
When described classification results is judged as the classification results of mistake and mobile terminal receives the uploading instructions of the classification results of corresponding described mistake, classification error information is uploaded to Cloud Server to upgrade the privately owned note training set corresponding with described mobile terminal by described mobile terminal, wherein, described classification error information comprises the classification results of pending note and mistake;
The classified lexicon that the Word library updating information that described mobile terminal obtains Cloud Server stores with mobile terminal described in synchronized update, wherein, described Word library updating information is that Cloud Server learns to obtain to privately owned note training set and publicly-owned note training set after the privately owned note training set corresponding with mobile terminal and/or the renewal of publicly-owned note training set of Cloud Server storage.
2. method according to claim 1, is characterized in that, described mobile terminal specifically comprises the step that pending note classifies to obtain classification results according to its classified lexicon stored:
Described mobile terminal carries out preliminary treatment to obtain word feature corresponding to pending note and rule feature to pending note;
Described mobile terminal is by the ratio P (C shared by the refuse messages that stores in classified lexicon 1), ratio P (C shared by non-junk note 2), word feature and the matching probability P (x of rule feature in refuse messages k| C 1) and non-junk note in matching probability P (x k| C 2) substitute in Bayes's classification formula, to obtain the probability P (C that described pending note belongs to refuse messages 1| X), shown in described Bayes's classification formula is specific as follows:
P ( C 1 | X ) = P ( C 1 ) Π k = 1 n P ( x k | C 1 ) Σ h = 1 2 [ P ( C h ) Π k = 1 n P ( x k | C h ) ]
Described mobile terminal obtains the probability P (C that pending note belongs to non-junk note 2| X), shown in specific as follows:
P(C 2|X)=1-P(C 1|X)
Described mobile terminal obtains the classification results of pending note, wherein, as P (C 1| X) >P (C 2| X) time then described pending note belong to refuse messages, otherwise described pending note belongs to non-junk note.
3. method according to claim 2, is characterized in that,
Described mobile terminal pending note carried out preliminary treatment with the step obtaining word feature corresponding to pending note and rule feature before also comprise:
Described mobile terminal judges that sender's number of pending note is whether in the privately owned black and white lists corresponding with mobile terminal, wherein, when described sender's number is in the privately owned blacklist corresponding with mobile terminal, then described pending note belongs to refuse messages, and when described sender's number is in the privately owned white list corresponding with mobile terminal, then described pending note belongs to non-junk note;
When described sender's number is not in the privately owned black and white lists corresponding with mobile terminal, described mobile terminal continues to judge sender's number whether in publicly-owned black and white lists, wherein, when described sender's number is in publicly-owned blacklist, then described pending note belongs to refuse messages, and when described sender's number is in publicly-owned white list, then described pending note belongs to non-junk note;
When described sender's number is not in publicly-owned black and white lists, described mobile terminal execution is described carries out preliminary treatment to obtain the step of word feature corresponding to pending note and rule feature to pending note.
4. method according to claim 3, is characterized in that,
When described classification results is judged as the classification results of mistake and mobile terminal receives the uploading instructions of the classification results of corresponding described mistake, the classification error information that described mobile terminal uploads to Cloud Server also comprises sender's number of pending note, and sender's number is uploaded to Cloud Server with in the privately owned black and white lists corresponding with mobile terminal judging whether sender's number to add described Cloud Server and store and/or publicly-owned black and white lists by described mobile terminal;
When the privately owned black and white lists corresponding with mobile terminal that described Cloud Server stores and/or publicly-owned black and white lists upgrade, described mobile terminal obtains the publicly-owned black and white lists and/or privately owned black and white lists that the privately owned black and white lists lastest imformation of Cloud Server and/or publicly-owned black and white lists lastest imformation store with synchronized update mobile terminal.
5. the method according to claim 1 or 4, is characterized in that,
The classification results of described mistake be by belong to refuse messages pending SMS classified be refuse messages for non-junk note or by belonging to the pending SMS classified of non-junk note;
Described Word library updating information at least comprises word feature and the matching probability of rule feature in refuse messages or non-junk note, the ratio shared by refuse messages and the ratio shared by non-junk note that privately owned note training set upgrades rear pending note.
6. a method for filtering spam short messages, is characterized in that, comprising:
Cloud Server learns to obtain the classified lexicon corresponding with mobile terminal to its privately owned note training set corresponding with mobile terminal stored and publicly-owned note training set, described classified lexicon is used for mobile terminal and classifies to obtain classification results to pending note, wherein, described classification results is refuse messages or non-junk note;
When described classification results is judged as the classification results of mistake and mobile terminal receives the uploading instructions of the classification results of corresponding described mistake, the classification error information that described cloud server mobile terminal is uploaded, wherein, described classification error information comprises the classification results of pending note and mistake;
Pending note adds in the privately owned note training set corresponding with mobile terminal to upgrade privately owned note training set by described Cloud Server;
After described privately owned note training set and/or publicly-owned note training set upgrade, described Cloud Server learns to obtain Word library updating information to privately owned note training set and publicly-owned note training set.
7. method according to claim 6, is characterized in that,
The classification results of described mistake be by belong to refuse messages pending SMS classified be refuse messages for non-junk note or by belonging to the pending SMS classified of non-junk note;
When the classification results of described mistake be by belong to refuse messages pending SMS classified for non-junk note time, after described privately owned note training set upgrades, described Cloud Server specifically comprises the step that privately owned note training set and publicly-owned note training set learn to obtain Word library updating information:
Described Cloud Server carries out preliminary treatment to obtain word feature corresponding to pending note and rule feature to pending note;
Described Cloud Server is according to the coupling frequency of word feature Sum fanction feature described in publicly-owned note training set in refuse messages, the coupling frequency of the feature of word feature Sum fanction described in privately owned training set in refuse messages, refuse messages quantity in privately owned note training set and publicly-owned note training set, non-junk note quantity obtains the first Word library updating information, wherein, described first Word library updating information comprises word feature and the matching probability of rule feature in refuse messages that privately owned note training set upgrades rear pending note, ratio shared by refuse messages and the ratio shared by non-junk note,
When the classification results of described mistake be by belong to non-junk note pending SMS classified for refuse messages time, after described privately owned note training set upgrades, described Cloud Server specifically comprises the step that privately owned note training set and publicly-owned note training set learn to obtain Word library updating information:
Described Cloud Server carries out preliminary treatment to obtain word feature corresponding to pending note and rule feature to pending note;
Described Cloud Server is according to the coupling frequency of word feature Sum fanction feature described in publicly-owned note training set in non-junk note, the coupling frequency of the feature of word feature Sum fanction described in privately owned training set in non-junk note, refuse messages quantity in privately owned note training set and publicly-owned note training set, non-junk note quantity obtains the second Word library updating information, wherein, described second Word library updating information comprises word feature and the matching probability of rule feature in non-junk note that privately owned note training set upgrades rear pending note, ratio shared by refuse messages and the ratio shared by non-junk note.
8. method according to claim 7, is characterized in that,
Described classification error information also comprises sender's number of pending note, described Cloud Server judges whether sender's number to add in the privately owned black and white lists corresponding with mobile terminal and/or publicly-owned black and white lists that Cloud Server stores, if then the described Cloud Server renewal privately owned black and white lists corresponding with mobile terminal and/or publicly-owned black and white lists are to obtain privately owned black and white lists lastest imformation and/or publicly-owned black and white lists lastest imformation, upgrade the publicly-owned black and white lists of mobile terminal storage and/or privately owned black and white lists to make described mobile terminal synchronization.
9. a mobile terminal, is characterized in that, comprising:
Sort module, classified lexicon for storing according to mobile terminal classifies to obtain classification results to pending note, wherein, described classification results is refuse messages or non-junk note, and described classified lexicon is that Cloud Server learns to obtain to its privately owned note training set corresponding with mobile terminal stored and publicly-owned note training set;
Upper transmission module, for be judged as when described classification results mistake classification results and mobile terminal receive the uploading instructions of the classification results of corresponding described mistake time, classification error information is uploaded to Cloud Server to upgrade the privately owned note training set corresponding with mobile terminal, wherein, described classification error information comprises the classification results of pending note and mistake;
Mobile terminal to update module, for obtaining the Word library updating information of Cloud Server with the classified lexicon stored in mobile terminal described in synchronized update, wherein, described Word library updating information learns for upgrading rear Cloud Server at privately owned note training set and/or publicly-owned note training set privately owned note training set and publicly-owned note training set and obtains.
10. a Cloud Server, is characterized in that, comprising:
Study module, learn to obtain the classified lexicon corresponding with mobile terminal for the privately owned note training set corresponding with mobile terminal that store Cloud Server and publicly-owned note training set, described classified lexicon is used for mobile terminal and classifies to obtain classification results to pending note, wherein, described classification results is refuse messages or non-junk note;
Cloud Server update module, when described classification results is judged as the classification results of mistake and mobile terminal receives the uploading instructions of the classification results of corresponding described mistake, for the classification error information that mobile terminal receive is uploaded, wherein, described classification error information comprises the classification results of pending note and mistake;
Described Cloud Server update module is also for adding pending note in the privately owned note training set corresponding with mobile terminal to upgrade privately owned note training set;
Described study module is also for after privately owned note training set and/or publicly-owned note training set upgrade, learn to obtain Word library updating information to privately owned note training set and publicly-owned note training set, and then make described mobile terminal upgrade the classified lexicon stored in mobile terminal according to Word library updating synchronizing information.
11. 1 kinds of filtering junk short messages systems, is characterized in that, comprising: mobile terminal as claimed in claim 9 and Cloud Server as claimed in claim 10.
CN201310279728.8A 2013-07-04 2013-07-04 A kind of method for filtering spam short messages, system, mobile terminal and Cloud Server Active CN104284306B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310279728.8A CN104284306B (en) 2013-07-04 2013-07-04 A kind of method for filtering spam short messages, system, mobile terminal and Cloud Server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310279728.8A CN104284306B (en) 2013-07-04 2013-07-04 A kind of method for filtering spam short messages, system, mobile terminal and Cloud Server

Publications (2)

Publication Number Publication Date
CN104284306A true CN104284306A (en) 2015-01-14
CN104284306B CN104284306B (en) 2018-07-24

Family

ID=52258688

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310279728.8A Active CN104284306B (en) 2013-07-04 2013-07-04 A kind of method for filtering spam short messages, system, mobile terminal and Cloud Server

Country Status (1)

Country Link
CN (1) CN104284306B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104967981A (en) * 2015-07-06 2015-10-07 王小安 Crank call and text message blocking method
CN105162984A (en) * 2015-09-23 2015-12-16 小米科技有限责任公司 Telephone number identification method and device
CN105307176A (en) * 2015-11-10 2016-02-03 中国科学院信息工程研究所 Routing method for robustness information in mobile social opportunity network
CN107517452A (en) * 2017-09-04 2017-12-26 上海连尚网络科技有限公司 A kind of method, equipment and computer-readable storage medium for being used to manage short message
CN110019773A (en) * 2017-08-14 2019-07-16 中国移动通信有限公司研究院 A kind of refuse messages detection method, terminal and computer readable storage medium
CN112597282A (en) * 2021-01-24 2021-04-02 深圳市诚立业科技发展有限公司 Management method applied to short message data security
CN115065972A (en) * 2022-06-09 2022-09-16 昕新讯飞科技(北京)有限公司 Junk information clearing system and equipment based on communication big data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877837A (en) * 2009-04-30 2010-11-03 华为技术有限公司 Method and device for short message filtration
CN102547623A (en) * 2010-12-08 2012-07-04 中国电信股份有限公司 Junk short message processing method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877837A (en) * 2009-04-30 2010-11-03 华为技术有限公司 Method and device for short message filtration
CN102547623A (en) * 2010-12-08 2012-07-04 中国电信股份有限公司 Junk short message processing method and system

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104967981A (en) * 2015-07-06 2015-10-07 王小安 Crank call and text message blocking method
CN105162984A (en) * 2015-09-23 2015-12-16 小米科技有限责任公司 Telephone number identification method and device
CN105162984B (en) * 2015-09-23 2018-11-23 小米科技有限责任公司 Telephone number recognition methods and device
CN105307176A (en) * 2015-11-10 2016-02-03 中国科学院信息工程研究所 Routing method for robustness information in mobile social opportunity network
CN105307176B (en) * 2015-11-10 2019-03-08 中国科学院信息工程研究所 Robustness message routing method in a kind of mobile social opportunistic network
CN110019773A (en) * 2017-08-14 2019-07-16 中国移动通信有限公司研究院 A kind of refuse messages detection method, terminal and computer readable storage medium
CN107517452A (en) * 2017-09-04 2017-12-26 上海连尚网络科技有限公司 A kind of method, equipment and computer-readable storage medium for being used to manage short message
WO2019042164A1 (en) * 2017-09-04 2019-03-07 上海连尚网络科技有限公司 Method, apparatus, and computer storage medium for managing sms messages
CN112597282A (en) * 2021-01-24 2021-04-02 深圳市诚立业科技发展有限公司 Management method applied to short message data security
CN112597282B (en) * 2021-01-24 2021-06-11 深圳市诚立业科技发展有限公司 Management method applied to short message data security
CN115065972A (en) * 2022-06-09 2022-09-16 昕新讯飞科技(北京)有限公司 Junk information clearing system and equipment based on communication big data
CN115065972B (en) * 2022-06-09 2024-01-12 通华大数据科技(烟台)有限公司 Junk information clearing system based on communication big data

Also Published As

Publication number Publication date
CN104284306B (en) 2018-07-24

Similar Documents

Publication Publication Date Title
CN104284306A (en) Junk message filter method and system, mobile terminal and cloud server
CN103500195B (en) Grader update method, device, system and equipment
CN102970402B (en) A kind of method updating mobile terminal addressbook associated person information and device
CN103533152A (en) Short message processing method and system of mobile terminal
CN102932539B (en) Terminal and method based on voice identification
CN103731253A (en) Communication device and synchronization method and communication system of wearable device paired with communication device
CN104507165A (en) Intelligent prompting method, system and device
CN103415004A (en) Method and device for detecting junk short message
CN111177367B (en) Case classification method, classification model training method and related products
CN101895868A (en) Method for filtering fallacious message based on mobile phone
CN101742482B (en) Mobile phone with function of automatically updating communication information and automatic update method
CN104714938A (en) Message processing method and electronic device
CN103268449A (en) Method and system for detecting mobile phone malicious codes at high speed
CN103237295A (en) Method for displaying electronic messages and mobile terminal
CN103501487A (en) Method, device, terminal, server and system for updating classifier
CN107145780A (en) Malware detection method and device
CN110473540A (en) Voice interactive method and system, terminal device, computer equipment and medium
CN102355517A (en) Information classification apparatus, information classification method and terminal
CN103369486A (en) System and method for preventing fraud SMS (Short message Service) message
CN113079123A (en) Malicious website detection method and device and electronic equipment
CN104159204A (en) Information interaction method and apparatus based on short messages
CN103607515A (en) Short message merging device and method
CN102946400A (en) Safety filtering method and system for mass short message content based on behavioural analysis
CN102572113A (en) System and method for extracting contact person information from texts to operate mobile phone
CN101795446A (en) Updating message sending and receiving method and terminal used for terminal user information updating

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant