CN104284306B - A kind of method for filtering spam short messages, system, mobile terminal and Cloud Server - Google Patents
A kind of method for filtering spam short messages, system, mobile terminal and Cloud Server Download PDFInfo
- Publication number
- CN104284306B CN104284306B CN201310279728.8A CN201310279728A CN104284306B CN 104284306 B CN104284306 B CN 104284306B CN 201310279728 A CN201310279728 A CN 201310279728A CN 104284306 B CN104284306 B CN 104284306B
- Authority
- CN
- China
- Prior art keywords
- short message
- mobile terminal
- training set
- owned
- pending
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/12—Messaging; Mailboxes; Announcements
- H04W4/14—Short messaging services, e.g. short message services [SMS] or unstructured supplementary service data [USSD]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W12/00—Security arrangements; Authentication; Protecting privacy or anonymity
- H04W12/12—Detection or prevention of fraud
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Information Transfer Between Computers (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
Embodiment of the present invention discloses a kind of method for filtering spam short messages, including:Mobile terminal classifies to obtain classification results to pending short message according to the classified lexicon of its storage, when classification results are judged as the uploading instructions for the classification results that the classification results of mistake and mobile terminal receive corresponding mistake, classification error information is uploaded to Cloud Server to update privately owned short message training set corresponding with mobile terminal by mobile terminal, and mobile terminal obtains the classified lexicon that the Word library updating information of Cloud Server is stored with synchronized update mobile terminal.Embodiment of the present invention also discloses a kind of mobile terminal, Cloud Server, filtering junk short messages system.By the above-mentioned means, the present invention can improve filter efficiency of the mobile terminal to refuse messages, make the filtering of refuse messages that there is personalization.
Description
Technical field
The present invention relates to text classification field, more particularly to a kind of method for filtering spam short messages, system, mobile terminal with
And Cloud Server.
Background technology
With the rapid development of mobile communication technology with the rapid promotion of mobile phone popularity rate, short message just with its it is short and small, rapid,
Many advantages, such as easy, cheap, has become a kind of important communication and exchange way of people, is brought greatly to user
Exchange it is convenient, meanwhile, refuse messages are becoming increasingly rampant, and especially quickly popularize in smart mobile phone, personal information security problem day
Beneficial severe today, many users are deep to be perplexed by refuse messages.Refuse messages refer to that user did not customize, and include
The contents such as advertisement, deception, pornographic and same content is continuously transmitted in the short time, influence user's normal use, work and life
Short message, common refuse messages content includes advertising information, pornography, false prize information, fraud information, mischief etc.,
I.e. to the nugatory information of user, many worries are brought to user, therefore are badly in need of being monitored filtering to refuse messages.
The main filter method for including two kinds of refuse messages in the prior art:A kind of method is in short message service center(SMSC)Etc. short messages
Processing center is handled;Another method is then to execute entire rubbish with the embedded program of establishment on the mobile terminals such as mobile phone
The filter process of short message.
Present inventor has found that some information are such as lottery information, ticket information, advertising information in long-term R & D
It may be refuse messages for a part of user, but refuse messages be then not belonging to for another part user, short
Telecommunications services center, which is filtered, may cause the information by mistake classification that can not reach on the mobile terminal of user, refuse messages
Filtering lacks the demand difference for considering different user;Additionally due to the calculating speed of mobile terminal and space be all than relatively limited,
The filter process of entire refuse messages is executed on mobile terminal can consume the excessive time and space, influence user to short message just
Often receive.
Invention content
The invention mainly solves the technical problem of providing a kind of method for filtering spam short messages, system, mobile terminal and
Cloud Server can improve filter efficiency of the mobile terminal to refuse messages, and the filtering of refuse messages is made to have personalization.
In order to solve the above technical problems, the first aspect of the present invention is:A kind of method for filtering spam short messages is provided, including:
Mobile terminal classifies to obtain classification results to pending short message according to the classified lexicon of its storage, wherein classification results
For refuse messages or non-junk short message;When classification results are judged as the classification results of mistake and mobile terminal receives corresponding mistake
When the uploading instructions of classification results accidentally, classification error information is uploaded to Cloud Server to update and mobile terminal by mobile terminal
Corresponding privately owned short message training set, wherein classification error information includes the classification results of pending short message and mistake;It is mobile whole
End obtains the classified lexicon that the Word library updating information of Cloud Server is stored with synchronized update mobile terminal, wherein Word library updating is believed
Breath is cloud after the privately owned short message training set corresponding with mobile terminal stored in Cloud Server and/or the update of publicly-owned short message training set
Server learns privately owned short message training set and publicly-owned short message training set and is obtained.
Wherein, mobile terminal classifies to obtain classification results to pending short message according to the classified lexicon of its storage
Step specifically includes:Mobile terminal pending short message is pre-processed with obtain the corresponding word feature of pending short message and
Rule feature;Mobile terminal is by the ratio P (C shared by the refuse messages stored in classified lexicon1), the ratio shared by non-junk short message
Example P (C2), the matching probability P (x of word feature and rule feature in refuse messagesk|C1) and non-junk short message in
With probability P (xk|C2) substitute into Bayes's classification formula, to obtain the probability P (C that pending short message belongs to refuse messages1| X),
Shown in Bayes's classification formula is specific as follows:
Mobile terminal obtains the probability P (C that pending short message belongs to non-junk short message2| X), it is specific as follows shown:
P(C2|X)=1-P(C1|X)
Mobile terminal obtains the classification results of pending short message, wherein as P (C1|X)>P(C2| X) when then pending short message
Belong to refuse messages, otherwise pending short message belongs to non-junk short message.
Wherein, pending short message is pre-processed in mobile terminal with obtain the corresponding word feature of pending short message with
And further include before the step of rule feature:Mobile terminal judge pending short message sender's number whether with mobile terminal
In corresponding privately owned black and white lists, wherein then wait locating when sender's number is in privately owned blacklist corresponding with mobile terminal
The letter that is in the wrong belongs to refuse messages, the then pending short message category when sender's number is in privately owned white list corresponding with mobile terminal
In non-junk short message;When sender's number is not in privately owned black and white lists corresponding with mobile terminal, mobile terminal continues to sentence
Whether disconnected sender's number is in publicly-owned black and white lists, wherein then pending short when sender's number is in publicly-owned blacklist
Letter belongs to refuse messages, and when sender's number is in publicly-owned white list, then pending short message belongs to non-junk short message;Work as transmission
When person's number is not in publicly-owned black and white lists, mobile terminal execution pre-processes pending short message to obtain pending short message
The step of corresponding word feature and rule feature.
Wherein, when classification results are judged as the classification results of mistake and mobile terminal receives the classification knot of corresponding mistake
When the uploading instructions of fruit, the classification error information that mobile terminal uploads to Cloud Server further includes the sender number of pending short message
Sender's number is uploaded to Cloud Server to judge whether sender's number Cloud Server storage is added by code, mobile terminal
In privately owned black and white lists corresponding with mobile terminal and/or publicly-owned black and white lists;When Cloud Server storage and mobile terminal pair
When the privately owned black and white lists answered and/or publicly-owned black and white lists update, mobile terminal obtains the privately owned black and white lists of Cloud Server more
Publicly-owned black and white lists that new information and/or publicly-owned black and white lists fresh information are stored with synchronized update mobile terminal and/or privately owned
Black and white lists.
Wherein, the classification results of mistake be by belong to refuse messages it is pending it is SMS classified be non-junk short message or will
Belong to the pending SMS classified for refuse messages of non-junk short message;Word library updating information includes at least privately owned short message training set more
Matching probability, rubbish of the word feature and rule feature of pending short message in refuse messages or non-junk short message are short after new
Ratio shared by the shared ratio of letter and non-junk short message.
In order to solve the above technical problems, the second aspect of the present invention is:A kind of method for filtering spam short messages is provided, including:
The privately owned short message training set corresponding with mobile terminal and publicly-owned short message training set that Cloud Server stores it are learnt to obtain
Classified lexicon corresponding with mobile terminal is obtained, classified lexicon classifies to be classified to pending short message for mobile terminal
As a result, wherein classification results are refuse messages or non-junk short message;When classification results are judged as the classification results of mistake and shifting
When dynamic terminal receives the uploading instructions of the classification results of corresponding mistake, the classification error of cloud server mobile terminal upload
Information, wherein classification error information includes the classification results of pending short message and mistake;Cloud Server adds pending short message
Enter in privately owned short message training set corresponding with mobile terminal to update privately owned short message training set;In privately owned short message training set and/or
After publicly-owned short message training set update, Cloud Server learns to obtain word privately owned short message training set and publicly-owned short message training set
Library fresh information.
Wherein, the classification results of mistake be by belong to refuse messages it is pending it is SMS classified be non-junk short message or will
Belong to the pending SMS classified for refuse messages of non-junk short message;When the classification results of mistake are that will belong to waiting for for refuse messages
Handle it is SMS classified for non-junk short message when, after the update of privately owned short message training set, Cloud Server to privately owned short message training set and
Publicly-owned short message training set is learnt to specifically include the step of Word library updating information to obtain:Cloud Server to pending short message into
Row pretreatment is to obtain the corresponding word feature of pending short message and rule feature;Cloud Server is according to publicly-owned short message training set
Word feature and rule feature are in rubbish in the matching frequency in refuse messages of middle word feature and rule feature, privately owned training set
The matching frequency, privately owned short message training set in rubbish short message and refuse messages quantity, the non-junk short message in publicly-owned short message training set
Quantity obtains the first Word library updating information, wherein the first Word library updating information includes pending after privately owned short message training set updates
The matching probability of the word feature of short message and rule feature in refuse messages, the ratio shared by refuse messages and non-junk
Ratio shared by short message;When mistake classification results be by belong to non-junk short message it is pending it is SMS classified be refuse messages
When, privately owned short message training set update after, Cloud Server to privately owned short message training set and publicly-owned short message training set learnt with
The step of obtaining Word library updating information specifically includes:Cloud Server pre-processes pending short message to obtain pending short message
Corresponding word feature and rule feature;Cloud Server is according to word feature and rule feature in publicly-owned short message training set non-
The matching frequency in non-junk short message of word feature and rule feature in the matching frequency, privately owned training set in refuse messages,
Privately owned short message training set and refuse messages quantity, non-junk short message quantity in publicly-owned short message training set obtain the second Word library updating
Information, wherein the second Word library updating information includes the word feature of pending short message and rule after privately owned short message training set updates
Then matching probability of the feature in non-junk short message, the ratio shared by refuse messages and the ratio shared by non-junk short message.
Wherein, classification error information further includes sender's number of pending short message, and Cloud Server judges whether to send
Person's number is added in privately owned black and white lists corresponding with mobile terminal and/or the publicly-owned black and white lists of Cloud Server storage, if
Then Cloud Server updates privately owned black and white lists corresponding with mobile terminal and/or publicly-owned black and white lists to obtain privately owned black and white lists
Fresh information and/or publicly-owned black and white lists fresh information, so that the publicly-owned black and white of mobile terminal synchronization update mobile terminal storage
List and/or privately owned black and white lists.
In order to solve the above technical problems, the third aspect of the present invention is:A kind of mobile terminal is provided, including:Sort module,
Classified lexicon for being stored according to mobile terminal classifies to obtain classification results to pending short message, wherein classification knot
Fruit is refuse messages or non-junk short message, and classified lexicon is the privately owned short message corresponding with mobile terminal that Cloud Server stores it
Training set and publicly-owned short message training set are learnt and are obtained;Uploading module, point for being judged as mistake when classification results
When class result and mobile terminal receive the uploading instructions of the classification results of corresponding mistake, classification error information is uploaded into cloud clothes
Be engaged in device to update privately owned short message training set corresponding with mobile terminal, wherein classification error information include pending short message and
The classification results of mistake;Mobile terminal to update module, the Word library updating information for obtaining Cloud Server are moved with synchronized update
The classified lexicon stored in terminal, wherein Word library updating information be privately owned short message training set and/or publicly-owned short message training set more
Cloud Server learns privately owned short message training set and publicly-owned short message training set and is obtained after new.
In order to solve the above technical problems, the fourth aspect of the present invention is:A kind of Cloud Server is provided, including:Study module,
For store privately owned short message training set corresponding with mobile terminal to Cloud Server and publicly-owned short message training set learnt with
Classified lexicon corresponding with mobile terminal is obtained, classified lexicon classifies to be divided to pending short message for mobile terminal
Class result, wherein classification results are refuse messages or non-junk short message;Cloud Server update module, when classification results are determined
It is mobile whole for receiving when receiving the uploading instructions of corresponding wrong classification results for the classification results and mobile terminal of mistake
Hold the classification error information uploaded, wherein classification error information includes the classification results of pending short message and mistake;Cloud service
Device update module is additionally operable to pending short message being added privately owned short to update in privately owned short message training set corresponding with mobile terminal
Believe training set;Study module is additionally operable to after privately owned short message training set and/or the update of publicly-owned short message training set, is instructed to privately owned short message
Practice collection and publicly-owned short message training set is learnt to obtain Word library updating information, and then makes mobile terminal according to Word library updating information
The classified lexicon stored in synchronized update mobile terminal.
In order to solve the above technical problems, the fifth aspect of the present invention is:A kind of filtering junk short messages system is provided, including such as
The preceding mobile terminal and foregoing Cloud Server.
The beneficial effects of the invention are as follows:The case where being different from the prior art, the present invention are stored by mobile terminal according to it
Classified lexicon classify pending short message to obtain classification results, when classification results are judged as the classification results of mistake
And mobile terminal, when receiving the uploading instructions of the classification results of corresponding mistake, classification error information is uploaded to cloud by mobile terminal
For server to update privately owned short message training set corresponding with mobile terminal, mobile terminal obtains the Word library updating information of Cloud Server
With the classified lexicon that synchronized update mobile terminal stores, by the powerful processing capacity of Cloud Server, to updated privately owned short
Letter training set and publicly-owned short message training set are learnt again, and the classificating word for having both personalized and universal general character is provided for mobile terminal
Library, and then accuracy of the mobile terminal to filtering junk short messages is continuously improved, it improves mobile terminal and the filtering of refuse messages is imitated
Rate makes the filtering of refuse messages have personalization.
Description of the drawings
Fig. 1 is the flow chart of method for filtering spam short messages first embodiment of the present invention;
Fig. 2 is the classified lexicon that mobile terminal is stored according to it in method for filtering spam short messages first embodiment of the present invention
Classify to pending short message to obtain the flow chart of classification results;
Fig. 3 is the flow chart of method for filtering spam short messages second embodiment of the present invention;
Fig. 4 is that the classification results when mistake are that will belong to rubbish in method for filtering spam short messages second embodiment of the present invention
Pending SMS classified Cloud Server when being non-junk short message of short message to privately owned short message training set and publicly-owned short message training set into
Row study is to obtain the flow chart of Word library updating information;
Fig. 5 is that the classification results when mistake are that will belong to non-rubbish in method for filtering spam short messages second embodiment of the present invention
Pending SMS classified Cloud Server when being refuse messages of rubbish short message to privately owned short message training set and publicly-owned short message training set into
Row study is to obtain the flow chart of Word library updating information;
Fig. 6 is the functional block diagram of one embodiment of mobile terminal of the present invention;
Fig. 7 is the functional block diagram of one embodiment of Cloud Server of the present invention;
Fig. 8 is the functional block diagram of one embodiment of filtering junk short messages system of the present invention.
Specific implementation mode
Below in conjunction with the attached drawing in embodiment of the present invention, the technical solution in embodiment of the present invention is carried out clear
Chu is fully described by, it is clear that described embodiment is only some embodiments of the invention, rather than whole realities
Apply mode.Based on the embodiment in the present invention, those of ordinary skill in the art institute without making creative work
The every other embodiment obtained, belongs to the scope of protection of the invention.
Referring to Fig. 1, method for filtering spam short messages first embodiment of the present invention includes:
Step S101:Classify pending short message to obtain classification results;
Mobile terminal classifies to pending short message according to the classified lexicon of its storage to be corresponded to obtaining pending short message
Classification results, wherein classification results be refuse messages or non-junk short message.The classified lexicon of mobile terminal storage and cloud service
The classified lexicon of device storage keeps synchronized update at any time, the classified lexicon of Cloud Server storage be Cloud Server it is stored with
The corresponding privately owned short message training set of mobile terminal and publicly-owned short message training set are learnt and are obtained.Private corresponding with mobile terminal
It can be classified refuse messages and/or non-junk short message that are empty or being stored with mobile terminal upload to have short message training set, when
When privately owned short message training set corresponding with mobile terminal is empty, classified lexicon is Cloud Server to publicly-owned short message training set and is
Empty privately owned short message training set is learnt and is obtained, i.e., only learns at this time to publicly-owned short message training set;When whole with movement
When to hold corresponding privately owned short message training set not be empty, classified lexicon is that Cloud Server pair privately owned short message corresponding with mobile terminal is instructed
Practice collection and publicly-owned short message training set is learnt and obtained.There are one publicly-owned short message training sets and multiple and shifting for Cloud Server storage
There are one shared short message training sets and multiple privately owned short message instructions for the corresponding privately owned short message training set of dynamic terminal, i.e. Cloud Server storage
Practice collection, wherein each privately owned short message training set corresponds to a mobile terminal.
Wherein, a certain number of classified refuse messages and non-junk short message, cloud are stored in publicly-owned short message training set
All mobile terminals on server share a publicly-owned short message training set;And privately owned short message training set is stored on mobile terminal
The classified refuse messages passed and non-junk short message, different mobile terminal correspond to different privately owned short message training sets.
Step S102:Classification error information is uploaded into Cloud Server and is instructed with updating privately owned short message corresponding with mobile terminal
Practice collection;
After mobile terminal obtains the classification results of pending short message, user judges that the classification results that mobile terminal obtains are
It is no for mistake classification results, wherein the classification results of mistake be by belong to refuse messages it is pending it is SMS classified be non-rubbish
Rubbish short message will belong to the pending SMS classified for refuse messages of non-junk short message.Certain short messages are for some users
It may be refuse messages, but may be then non-junk short message for other users, therefore different user is for same
The correctness of the classification results of pending short message may have different judging results.
When user judges that classification results receive for the classification results and mobile terminal of mistake the classification of corresponding above-mentioned mistake
When uploading instructions as a result, mobile terminal according to uploading instructions classification error information is uploaded into Cloud Server with update with it is mobile
Classification error information is uploaded to and is taken after receiving uploading instructions by the corresponding privately owned short message training set of terminal, i.e. mobile terminal
Business device, so that Cloud Server pair privately owned short message training set corresponding with the mobile terminal is updated, wherein classification error information
Classification results including pending short message and corresponding mistake.
Step S103:Obtain the classified lexicon that the Word library updating information of Cloud Server is stored with synchronized update mobile terminal.
Mobile terminal obtains the classified lexicon that the Word library updating information of Cloud Server is stored with synchronized update mobile terminal,
In, Word library updating information is the privately owned short message training set corresponding with mobile terminal stored in Cloud Server and/or publicly-owned short message
Cloud Server learns privately owned short message training set and publicly-owned short message training set and is obtained after training set update, i.e. Word library updating
Information is namely as follows when occurring to be obtained when at least one of privately owned short message training set and publicly-owned short message training set update
Word library updating information is obtained when a kind of update of situation in three kinds of situations:(1)Publicly-owned short message training set update,(2)It is privately owned
Short message training set updates,(3)Privately owned short message training set and publicly-owned short message training set update simultaneously.Cloud Server can periodically be added one
It is trained in the classified refuse messages and/or non-junk short message to publicly-owned short message training set of fixed number amount with updating publicly-owned short message
Collection.Wherein, when privately owned short message training set is not have the classified of memory mobile terminal upload in empty i.e. privately owned short message training set
When short message and the privately owned short message training set of no update, Word library updating information is Cloud Server to updated publicly-owned short message training set
And learnt to obtain i.e. at this time only to updated publicly-owned short message training set for empty privately owned short message training set
It practises;When privately owned short message training set is not empty, Word library updating information is in privately owned short message training set and/or publicly-owned short message training set
Cloud Server learns privately owned short message training set and publicly-owned short message training set and is obtained after update.When Cloud Server passes through
It practises after obtaining Word library updating information, mobile terminal downloads Word library updating letter by modes such as GPRS, WiFi from Cloud Server
Breath, the Word library updating information that mobile terminal only needs download information capacity smaller are updated entire in Cloud Server without downloading
The update of the classified lexicon stored to mobile terminal can be realized in classified lexicon, reduces needed for mobile terminal to update classified lexicon
Flow.Mobile terminal classifies to follow-up pending short message according to updated classified lexicon, is circulated throughout to form one
Journey.
The classification error information update that the present invention is uploaded by the powerful processing capacity of Cloud Server, according to mobile terminal with
The corresponding privately owned short message training set of mobile terminal, privately owned short message training set and/or publicly-owned short message training set update after, in conjunction with point
Word dictionary and deactivated dictionary are learnt again, and general character and personalized classification are had both by further learning to provide for mobile terminal
Dictionary, and then processing speed and accuracy of the mobile terminal to filtering junk short messages is continuously improved, mobile terminal is improved to rubbish
The filter efficiency of short message, while personalized filtering junk short messages being also provided for mobile terminal, so that the filtering of refuse messages is had
Personalization meets different filtration needs of the different user to short message.
Referring to Fig. 2, mobile terminal dividing according to its storage in method for filtering spam short messages first embodiment of the present invention
Class dictionary classifies to pending short message specifically includes following sub-step to obtain classification results:
Sub-step S1011:Pending short message is pre-processed with obtain the corresponding word feature of pending short message and
Rule feature;
Mobile terminal pre-processes pending short message to obtain the corresponding word feature of pending short message and rule
Feature specifically includes:
Mobile terminal segments pending short message, and the participle dictionary by inquiring its storage divides pending short message
At word feature significant one by one, wherein Chinese word segmentation is that Chinese short message text segmentation is minimum at Chinese, energy is independent
Movable, significant language element, that is, entry;It, will according to such as space of the separation mark between word for English short message text
English short message text is separated into word feature one by one.The segmenting method of present embodiment is Word Intelligent Segmentation method, that is, utilizes hidden horse
Er Kefu models (Hidden Markov Model, HMM) algorithm.In other embodiments, can also be used Dictionary based segment method,
The methods of cutting labelling method, the participle method based on statistics, rule-based participle method are segmented, and are not made too many restrictions herein.
Mobile terminal does not have contributive word feature according to the deactivated dictionary deletion of its storage to SMS classified comprising point
Single word, interjection, auxiliary words of mood, pronoun for being formed after word etc..
After deletion does not have contributive word feature, mobile terminal is further chosen from remaining word feature to short message
Higher word feature is contributed in classification, mutual information whether appearance by calculating each remaining word feature A with some C that classifies
MI(A;C), wherein classification C includes refuse messages C1 and two classes of non-junk short message C2, mutual information MI (A;C calculation formula tool)
Body is as follows:
Further therefrom choosing has highest mutual information MI (A;C word feature) judges that word used is special as classification
Sign.
Mobile terminal obtains the rule feature of pending short message, and whether rule feature includes short message length, includes URL, is
No includes telephone number and whether short message sending person number is phone number.
Pending short message X is expressed as:X={x1,x2,…,xn, xk(k=1,2 ..., n) it is the corresponding word of pending short message
Feature and rule feature.
Sub-step S1012:By ratio, word feature and the rule feature shared by refuse messages, non-junk short message in rubbish
Matching probability in matching probability and non-junk short message in rubbish short message substitutes into Bayes's classification formula;
Mobile terminal is by the ratio P (C shared by the refuse messages stored in classified lexicon1), the ratio shared by non-junk short message
Example P (C2), the corresponding word feature of pending short message and rule feature xkMatching probability P (x in refuse messagesk|C1) with
And the matching probability P (x in non-junk short messagek|C2) substitute into Bayes's classification formula, belong to rubbish to obtain pending short message
Probability P (the C of short message1| X), Bayes's classification formula is specific as follows shown:
Wherein, the ratio P (C shared by refuse messages1) privately owned short message training set i.e. corresponding with mobile terminal and publicly-owned short
Refuse messages quantity accounts for all short messages in letter training set(That is refuse messages and non-junk short message)The ratio of quantity;Non-junk short message
Shared ratio P (C2) non-junk short message number in privately owned short message training set i.e. corresponding with mobile terminal and publicly-owned short message training set
Amount accounts for the ratio of all short message quantity.The ratio P shared by refuse messages is stored in classified lexicon corresponding with mobile terminal
(C1), the ratio P (C shared by non-junk short message2), the matching probability P (x of word feature and rule feature in refuse messagesk|
C1) and non-junk short message in matching probability P (xk|C2), different mobile terminal corresponds to different classified lexicons.
Sub-step S1013:Obtain the probability that pending short message belongs to non-junk short message;
Mobile terminal further obtains the probability P (C that pending short message belongs to non-junk short message2| X), it is specific as follows shown:
P(C2|X)=1-P(C1|X)
In other embodiments, it Bayes's classification formula can also be used obtains pending short message and belong to non-junk short message
Probability does not make too many restrictions herein.
Sub-step S1014:Obtain the classification results of pending short message.
Mobile terminal belongs to the probability P (C of refuse messages according to pending short message1| X) and belong to the general of non-junk short message
Rate P (C2| X) obtain the classification results of pending short message, wherein as P (C1|X)>P(C2| X) when then pending short message classification knot
Fruit is to belong to refuse messages, and otherwise the classification results of pending short message are to belong to non-junk short message.Meanwhile it also can be by judging P
(C1| X) whether more than 0.5 classification judgement is carried out, as P (C1| X) be more than 0.5 when then belong to refuse messages, otherwise belong to non-rubbish
Rubbish short message.
When classification results are judged as the classification results of mistake and mobile terminal receives the classification results of corresponding mistake
When uploading instructions, then the privately owned short message training set corresponding with mobile terminal of at least Cloud Server storage is updated, and correspondence obtains
The Word library updating information obtained is including at least the corresponding word feature of pending short message and rule after the update of privately owned short message training set
Matching probability of the feature in refuse messages or non-junk short message, the ratio P (C shared by refuse messages1) and non-junk short message
Shared ratio P (C2), specifically, when mistake classification results be by belong to refuse messages it is pending it is SMS classified be non-rubbish
The matching probability of update word feature and rule feature in refuse messages, the ratio shared by refuse messages are corresponded to when rubbish short message
P(C1) and non-junk short message shared by ratio P (C2);When the classification results of mistake are that will belong to the pending of non-junk short message
It is SMS classified to correspond to matching probability, rubbish in non-junk short message of update word feature and rule feature when being refuse messages
Ratio P (C shared by short message1) and non-junk short message shared by ratio P (C2)。
In addition, pending short message is pre-processed in mobile terminal with obtain the corresponding word feature of pending short message with
And further include before the step of rule feature:
Whether mobile terminal judges sender's number of pending short message in privately owned black and white lists corresponding with mobile terminal
In, wherein when sender's number is in privately owned blacklist corresponding with mobile terminal, then pending short message belongs to refuse messages,
When sender's number is in privately owned white list corresponding with mobile terminal, then pending short message belongs to non-junk short message.
When sender's number is not in privately owned black and white lists corresponding with mobile terminal, mobile terminal continues to judge to send
Whether person's number is in publicly-owned black and white lists, wherein when sender's number is in publicly-owned blacklist, then pending short message belongs to
Refuse messages, when sender's number is in publicly-owned white list, then pending short message belongs to non-junk short message.
When sender's number is not in publicly-owned black and white lists, mobile terminal execution is above-mentioned to locate pending short message in advance
The i.e. sub-step S1011 of the step of reason is to obtain the corresponding word feature of pending short message and rule feature.
After being performed in mobile terminal the classification results that above-mentioned steps S101 obtains pending short message, when classification results are determined
When receiving the uploading instructions of corresponding wrong classification results for the classification results and mobile terminal of mistake, mobile terminal uploads to
The classification error information of Cloud Server further includes sender's number of pending short message, and mobile terminal uploads to sender's number
Cloud Server is to judge whether the privately owned black and white lists corresponding with mobile terminal that sender's number is added to Cloud Server storage
And/or in publicly-owned black and white lists, if the then privately owned black and white corresponding with mobile terminal of Cloud Server update Cloud Server storage
List and/or publicly-owned black and white lists.Specifically, after the sender's number for the pending short message that mobile terminal uploads, Cloud Server
Sender's number is added in privately owned black and white lists corresponding with mobile terminal first, and reaches a fixed number in sender's number
Publicly-owned black and white lists are then added after amount.For example, should if more than a preset quantity such as 10,000 one sender's numbers of user's report
Sender's number is added in publicly-owned blacklist;When simultaneously more than another preset quantity such as 100 one sender's numbers of user's report
Apparent then sender's number being added in publicly-owned blacklist containing illicit content of the short message content.
When the privately owned black and white lists corresponding with mobile terminal of Cloud Server storage and/or publicly-owned black and white lists update,
Mobile terminal obtains the privately owned black and white lists fresh information of Cloud Server and/or publicly-owned black and white name by modes such as GPRS, WiFi
The publicly-owned black and white lists and/or privately owned black and white lists that single fresh information is stored with synchronized update mobile terminal.Further, mobile
Terminal judges follow-up pending short message using updated publicly-owned black and white lists and/or privately owned black and white lists.For example,
It is correct judge to obtain pending short message belong to refuse messages or by belong to refuse messages it is pending it is SMS classified as non-junk it is short
After letter, the corresponding sender's number of pending short message is uploaded in Cloud Server, Cloud Server is further by the sender number
Code is added in privately owned black and white lists corresponding with mobile terminal.
It is appreciated that method for filtering spam short messages first embodiment of the present invention by mobile terminal according to Cloud Server pair
The classified lexicon that privately owned short message training set corresponding with mobile terminal and publicly-owned short message training set are learnt and obtained is divided
Class, when classification results mistake, mobile terminal uploads classification error information and moves classificating word corresponding with dynamic terminal to timely update
Library, the study that mobile terminal is not necessarily to carry out short message sample can classify, and then can improve mobile terminal to refuse messages
Filter efficiency, and different mobile terminal corresponds to different privately owned short message training set and classified lexicon, makes the mistake of refuse messages
Filter has personalization, and improves the filtering accuracy of refuse messages.
In addition, the present invention had both obtained the word feature of short message using participle dictionary and deactivated dictionary, also obtain short message length,
Whether comprising URL, whether comprising telephone number and short message sending person number whether be phone number etc. rule feature, pass through
Word feature and the matching probability of rule feature are substituted into Bayes's classification formula, more accurately directly calculate pending short message
Belong to the probability of refuse messages, and judge rapidly, calculating is simple and quick efficient, greatly reduces the processing of mobile terminal
Workload.
Referring to Fig. 3, method for filtering spam short messages second embodiment of the present invention includes:
Step S201:Privately owned short message training set and publicly-owned short message training set are learnt;
The privately owned short message training set corresponding with mobile terminal and publicly-owned short message training set that Cloud Server stores it carry out
Study is to obtain classified lexicon corresponding with mobile terminal.Privately owned short message training set corresponding with mobile terminal can be empty or deposit
The classified refuse messages and/or non-junk short message for containing mobile terminal upload, when privately owned short message corresponding with mobile terminal
When training set is empty, Cloud Server is learnt to publicly-owned short message training set and for empty privately owned short message training set to be divided
I.e. Cloud Server only learns publicly-owned short message training set class dictionary at this time;When privately owned short message corresponding with mobile terminal is trained
When collection is not empty, Cloud Server pair privately owned short message training set corresponding with mobile terminal and publicly-owned short message training set learnt with
Obtain classified lexicon.Cloud Server pair privately owned short message training set corresponding with mobile terminal and publicly-owned short message training set learn
It specifically includes:Cloud Server is trained according to the participle dictionary of its storage, deactivated dictionary pair privately owned short message corresponding with mobile terminal
Collection and publicly-owned short message training set are pre-processed to obtain each refuse messages in publicly-owned short message training set, privately owned short message training set
And the corresponding word feature of non-junk short message and rule feature, further according to refuse messages quantity and non-junk short message
Quantity obtains matching probability in refuse messages, non-junk short message of each word feature and rule feature, refuse messages institute
The ratio shared by ratio and non-junk short message accounted for.
When the participle dictionary of Cloud Server storage and/or when deactivated Word library updating, the participle dictionary of mobile terminal storage and/
Or deactivated dictionary keeps synchronized update with Cloud Server.Classified lexicon classifies to obtain to pending short message for mobile terminal
It must classify as a result, wherein, classification results are refuse messages or non-junk short message.Cloud Server corresponds to different mobile terminal and is stored with
Classified lexicon corresponding with each mobile terminal.A certain number of classified rubbish are stored in publicly-owned short message training set
Short message and non-junk short message.
Before carrying out classification judgement to pending short message for the first time, mobile terminal can upload user judge to obtain it is a certain number of
In the privately owned short message training set corresponding with mobile terminal that refuse messages and non-junk short message are stored to Cloud Server;In addition,
The initial time of filtering junk short messages privately owned short message training set corresponding with mobile terminal is alternatively sky.For the first time to pending short
Before letter carries out classification judgement, mobile terminal obtained by modes such as GPRS, WiFi stored in cloud service it is corresponding with mobile terminal
Classified lexicon to carry out classification judgement.
Step S202:Receive the classification error information that mobile terminal uploads;
When the classification results that user judges to obtain mobile terminal acquisition are wrong classification results and mobile terminal receives
When the uploading instructions of the classification results of corresponding mistake, the classification error information that cloud server mobile terminal uploads, classification mistake
False information includes the classification results of pending short message and mistake, and the classification results of mistake are that will belong to the pending of refuse messages
It is SMS classified be non-junk short message or by belong to non-junk short message it is pending it is SMS classified be refuse messages.
Step S203:Pending short message is added in privately owned short message training set;
Pending short message in classification error information is added privately owned short message corresponding with mobile terminal and trained by Cloud Server
It concentrates to update privately owned short message training set corresponding with mobile terminal.When the classification results for the mistake that mobile terminal uploads are that will belong to
In refuse messages it is pending it is SMS classified be non-junk short message when, pending short message is added privately owned short message and trained by Cloud Server
In the refuse messages class of concentration;When the classification results for the mistake that mobile terminal uploads are that will belong to the pending short of non-junk short message
When letter is classified as refuse messages, pending short message is added in the non-junk short message class in privately owned short message training set Cloud Server.
Step S204:Privately owned short message training set and publicly-owned short message training set are learnt.
After privately owned short message training set and/or the update of publicly-owned short message training set, Cloud Server to privately owned short message training set and
Publicly-owned short message training set is learnt to obtain Word library updating information, and the acquisition of Word library updating information specifically includes following two feelings
Condition:(1)When privately owned short message training set is not have the classified short of memory mobile terminal upload in empty i.e. privately owned short message training set
Letter and without update privately owned short message training set when, Word library updating information be Cloud Server to updated publicly-owned short message training set into
Row learns and obtains;(2)When privately owned short message training set is not empty, Word library updating information be in privately owned short message training set and/or
Cloud Server learns publicly-owned short message training set and privately owned short message training set and is obtained after publicly-owned short message training set update.It moves
Dynamic terminal updates the classified lexicon stored in mobile terminal according to Word library updating synchronizing information, at this time the classificating word in Cloud Server
Library is updated also according to Word library updating information, wherein Word library updating information can be stored in whole with movement on Cloud Server
It holds in corresponding classified lexicon.It is each in the classified lexicon of mobile terminal storage before carrying out classification judgement to pending short message
A word feature and rule feature stored on the matching probability and Cloud Server in refuse messages and non-junk short message with
The corresponding classified lexicon of mobile terminal keeps synchronizing.
Referring to Fig. 4, when mistake classification results be by belong to refuse messages it is pending it is SMS classified be that non-junk is short
When letter, in method for filtering spam short messages second embodiment of the present invention privately owned short message training set update after Cloud Server to privately owned
Short message training set and publicly-owned short message training set are learnt to specifically include following sub-step to obtain Word library updating information:
Sub-step S2041a:Pending short message is pre-processed with obtain the corresponding word feature of pending short message and
Rule feature;
Cloud Server pre-processes pending short message to obtain the corresponding word feature of pending short message and rule
Feature, pending short message X are expressed as:X={x1,x2,…,xn, xk(k=1,2 ..., n) it is that the corresponding word of pending short message is special
Sign and rule feature.
Sub-step S2042a:According to word feature and the matching frequency of rule feature, refuse messages quantity, non-junk short message
Quantity obtains the first Word library updating information.
Cloud Server is according to the corresponding word feature of pending short message and rule feature x in publicly-owned short message training setkIn rubbish
The corresponding word feature of pending short message and rule feature x in the matching frequency, privately owned training set in rubbish short messagekIn refuse messages
In the matching frequency, privately owned short message training set and refuse messages quantity, non-junk short message quantity in publicly-owned short message training set obtains
Obtain the first Word library updating information, wherein the first Word library updating information is included in privately owned short message training set corresponding with mobile terminal
The corresponding word feature of pending short message and rule feature x after updatekMatching probability, refuse messages institute in refuse messages
The ratio shared by ratio and non-junk short message accounted for.Mobile terminal updates classified lexicon according to the first Word library updating synchronizing information
Namely the word feature and rule feature x stored in modification classified lexicon corresponding with mobile terminalkIn refuse messages
Matching probability, the matching probability by the word feature for not including in classified lexicon corresponding with mobile terminal in refuse messages
It is added in classified lexicon, and changes shared by ratio and the non-junk short message shared by the refuse messages stored in classified lexicon
Ratio.Word feature and rule feature xkMatching probability in refuse messages is equal to pending in publicly-owned short message training set
The corresponding word feature of short message and rule feature xkPending short message pair in the matching frequency+privately owned training set in refuse messages
The word feature and rule feature x answeredkThe matching frequency in refuse messages and divided by privately owned short message training set and publicly-owned short
Believe the refuse messages quantity in training set.
Referring to Fig. 5, when mistake classification results be by belong to non-junk short message it is pending it is SMS classified be that rubbish is short
When letter, in method for filtering spam short messages second embodiment of the present invention privately owned short message training set update after Cloud Server to privately owned
Short message training set and publicly-owned short message training set are learnt to specifically include following sub-step to obtain Word library updating information:
Sub-step S2041b:Pending short message is pre-processed with obtain the corresponding word feature of pending short message and
Rule feature;
Cloud Server pre-processes pending short message to obtain the corresponding word feature of pending short message and rule
Feature.
Sub-step S2042b:According to word feature, the matching frequency of rule feature, refuse messages quantity, non-junk short message
Quantity obtains the second Word library updating information.
Cloud Server is according to the corresponding word feature of pending short message and rule feature x in publicly-owned short message training setkNon-
The corresponding word feature of pending short message and rule feature x in the matching frequency, privately owned training set in refuse messageskIn non-junk
The matching frequency, privately owned short message training set in short message and the refuse messages quantity in publicly-owned short message training set, non-junk short message number
Amount obtains the second Word library updating information, wherein the second Word library updating information is included in privately owned short message instruction corresponding with mobile terminal
Practice the corresponding word feature of pending short message and rule feature x after collection updatekMatching probability, rubbish in non-junk short message
The ratio shared by ratio and non-junk short message shared by short message.Mobile terminal is according to the update point of the second Word library updating synchronizing information
The word feature and rule feature x stored in class dictionary namely modification classified lexicon corresponding with mobile terminalkIn non-junk
Matching probability in short message, by the word feature for not including in classified lexicon corresponding with mobile terminal in non-junk short message
Matching probability be added in classified lexicon, and the ratio shared by the refuse messages that store and non-rubbish in modification classified lexicon
Ratio shared by rubbish short message.Word feature and rule feature xkMatching probability in non-junk short message is instructed equal to publicly-owned short message
Practice and concentrates the corresponding word feature of pending short message and rule feature xkThe matching frequency+privately owned training set in non-junk short message
In the corresponding word feature of pending short message and rule feature xkIt is the matching frequency in non-junk short message and divided by privately owned short
Believe the non-junk short message quantity in training set and publicly-owned short message training set.
When publicly-owned short message training set updates, the update of publicly-owned short message training set includes increasing refuse messages or the non-rubbish of increase
Rubbish short message increases refuse messages and non-junk short message simultaneously, the update with above-mentioned privately owned short message training set with study similarly,
Short message to updating part in publicly-owned short message training set pre-processes, further according to word feature, the matching of rule feature
The frequency, refuse messages quantity, non-junk short message quantity obtain corresponding Word library updating information, special to update word feature, rule
It levies shared by the matching probability in refuse messages and/or non-junk short message, the ratio shared by refuse messages and non-junk short message
Ratio.When privately owned short message training set and publicly-owned short message training set update simultaneously, also with above-mentioned publicly-owned short message training set, privately owned
Similarly with study, details are not described herein again for the update of short message training set.
Cloud Server pair privately owned short message training set corresponding with mobile terminal and publicly-owned short message training set are learnt i.e. root
According to the matching frequency, refuse messages quantity and the non-rubbish of word feature, rule feature in refuse messages and non-junk short message
Rubbish short message quantity obtains matching probability, the refuse messages of word feature, rule feature in refuse messages and non-junk short message
Ratio shared by shared ratio and non-junk short message, the matching probability of acquisition, shared ratio are stored in classified lexicon,
Different mobile terminal corresponds to different classified lexicons.When privately owned short message training set and/or publicly-owned short message training set update, cloud clothes
Business device need to only pre-process the short message for updating part, that is, retain privately owned short message training set and/or publicly-owned short message instruction before update
Practice the corresponding word feature of each short message concentrated and rule feature, the effect of Cloud Server pretreatment and study can be improved
Rate, and then improve the efficiency of update classified lexicon.
In addition, further including the sender of pending short message in the classification error information that cloud server mobile terminal uploads
Number, Cloud Server judges whether the addition Cloud Server storage of sender's number and movement after receiving sender's number
In the corresponding privately owned black and white lists of terminal and/or publicly-owned black and white lists, if then Cloud Server update is corresponding with mobile terminal
Privately owned black and white lists and/or publicly-owned black and white lists are updated with obtaining privately owned black and white lists fresh information and/or publicly-owned black and white lists
Information, so that the publicly-owned black and white lists and/or privately owned black and white lists of mobile terminal synchronization update mobile terminal storage.Publicly-owned black and white
List fresh information, privately owned black and white lists fresh information include that sender's number and sender's number correspond to the name being added
It is single.For example, if more than a preset quantity such as 10,000 one sender's numbers of user's report that sender's number addition is publicly-owned black
In list;When more than another preset quantity such as 100 one sender's numbers of user's report simultaneously the short message content obviously containing against
Then sender's number is added in publicly-owned blacklist for method content.Belong in another example obtaining pending short message in correct judgement
Refuse messages will belong to the pending SMS classified for after non-junk short message of refuse messages, by the corresponding transmission of pending short message
Person's number uploads in Cloud Server, and Cloud Server further sender's number is added corresponding with mobile terminal privately owned black
In white list.
It is appreciated that method for filtering spam short messages second embodiment of the present invention it is stored by Cloud Server with shifting
The dynamic corresponding privately owned short message training set of terminal and publicly-owned short message training set are learnt to obtain classification corresponding with mobile terminal
Dictionary, mobile terminal carry out classification judgement according to classified lexicon to pending short message, when the classification for receiving mobile terminal upload
Cloud Server is learnt and obtains Word library updating information after error message, and then mobile terminal synchronization update mobile terminal is made to deposit
The classified lexicon of storage, Cloud Server store larger publicly-owned short message training set, privately owned short message training set and the execution of occupied space
The larger learning process of calculation amount can improve mobile terminal to the filter efficiency of refuse messages and reduce accounting for for mobile terminal
With space, and Cloud Server corresponds to different mobile terminal and is stored with corresponding privately owned short message training set and classified lexicon, makes rubbish
The filtering of rubbish short message has personalization, and then improves the filtering accuracy of refuse messages.
Referring to Fig. 6, one embodiment of mobile terminal of the present invention includes:
Sort module 301, for being classified pending short message to obtain according to the classified lexicon stored in mobile terminal
It must classify as a result, and being classified to follow-up pending short message according to updated classified lexicon;Before specific implementation can refer to
The corresponding realization processes of step S101 are stated, are no longer repeated herein.
Uploading module 302, the classification results for being obtained when sort module 301 are judged as the classification results of mistake and shifting
When dynamic terminal receives the uploading instructions of the classification results of corresponding mistake, classification error information is uploaded into Cloud Server to update
Privately owned short message training set corresponding with mobile terminal;Specific implementation can refer to the corresponding realization processes of abovementioned steps S102, herein
No longer repeat.
Mobile terminal to update module 303, for obtaining the Word library updating information of Cloud Server with synchronized update mobile terminal
The classified lexicon of middle storage, and obtain privately owned black and white lists fresh information and/or the update of publicly-owned black and white lists of Cloud Server
The publicly-owned black and white lists and/or privately owned black and white lists that information is stored with synchronized update mobile terminal;Specific implementation can refer to aforementioned
The corresponding realization processes of step S103, are no longer repeated herein.
Referring to Fig. 7, one embodiment of Cloud Server of the present invention includes:
Study module 401, for the privately owned short message training set corresponding with mobile terminal that is stored to Cloud Server and publicly-owned
Short message training set is learnt to obtain corresponding with mobile terminal classified lexicon, be additionally operable in privately owned short message training set and/or
After publicly-owned short message training set update, privately owned short message training set and publicly-owned short message training set are learnt to obtain Word library updating letter
Breath, and then mobile terminal is made to update the classified lexicon stored in mobile terminal according to Word library updating synchronizing information;Specific implementation can
With reference to the corresponding realization processes of abovementioned steps S201, no longer repeat herein.
Cloud Server update module 402, for when classification results be judged as mistake classification results and mobile terminal connect
When receiving the uploading instructions of the classification results of corresponding mistake, the classification error information that mobile terminal uploads is received, and will classification
Pending short message in error message is added in the corresponding privately owned short message training set of mobile terminal to update privately owned short message training set;
It is additionally operable to judge whether that the privately owned black and white lists corresponding with mobile terminal of its storage and/or publicly-owned black are added in sender's number
In white list, if then Cloud Server update module 402 updates corresponding with mobile terminal privately owned black and white lists and/or publicly-owned black
White list is to obtain privately owned black and white lists fresh information and/or publicly-owned black and white lists fresh information;Specific implementation can refer to aforementioned
The corresponding realization processes of step S202, are no longer repeated herein.
Referring to Fig. 8, one embodiment of filtering short message system of the present invention includes mobile terminal and server:
Mobile terminal includes:Privately owned black and white lists, classified lexicon, participle dictionary, deactivate dictionary, private at publicly-owned black and white lists
There are black and white lists filtering module 501, publicly-owned black and white lists filtering module 502, sort module 503, uploading module 504 and movement
Terminal update module 505, wherein privately owned black and white lists, publicly-owned black and white lists, classified lexicon, participle dictionary and deactivated dictionary are equal
Synchronized update is kept by mobile terminal to update module 505 and Cloud Server.
Privately owned black and white lists filtering module 501 and publicly-owned black and white lists filtering module 502, for passing through privately owned black and white name
Single and publicly-owned black and white lists carry out pending short message the filtering of black and white lists, realize the preliminary fast filtering of refuse messages;Tool
Body realization can refer to the corresponding realization process of aforementioned black and white lists filtration step, no longer repeat herein.
Sort module 503 is used for when pending short message is not in publicly-owned, privately owned black and white lists, first according to participle dictionary
Processing short message is treated with deactivated dictionary and carries out pretreatment acquisition word feature and rule feature, is secondly stored according in mobile terminal
Classified lexicon classify pending short message to obtain classification results;It is corresponding that specific implementation can refer to abovementioned steps S101
Realization process, is no longer repeated herein.
Uploading module 504, for when the classification results and mobile terminal that the classification results of above-mentioned sort module 503 are mistake
When receiving the uploading instructions of the classification results of corresponding mistake, it is mobile whole to update that classification error information is uploaded to Cloud Server
Hold corresponding privately owned short message training set and privately owned black and white lists;Specific implementation can refer to that abovementioned steps S102 is corresponding to be realized
Journey no longer repeats herein.
Mobile terminal to update module 505 is used to obtain the publicly-owned black and white lists fresh information of Cloud Server and/or privately owned black
The publicly-owned black and white lists and/or privately owned black and white lists that white list updating information is stored with synchronized update mobile terminal;It is additionally operable to obtain
Obtain classified lexicon of the Word library updating information of Cloud Server to be stored in synchronized update mobile terminal;It is additionally operable to obtain Cloud Server
Participle Word library updating information and/or the participle dictionary that is stored with synchronized update mobile terminal of deactivated dictionary fresh information and/or
Deactivate dictionary;Specific implementation can refer to the corresponding realization processes of abovementioned steps S103, no longer repeat herein.
Cloud Server includes:It segments dictionary, deactivate dictionary, publicly-owned short message training set, privately owned short message training set, publicly-owned black and white
List, privately owned black and white lists, classified lexicon, study module 506 and Cloud Server update module 507.Wherein, segment dictionary,
Deactivate dictionary, publicly-owned short message training set and publicly-owned black and white lists are that all mobile terminals share in rubbish filtering system, and
Privately owned short message training set, privately owned black and white lists, classified lexicon are then to correspond to each mobile terminal respectively, and each mobile terminal is not
Together.
Study module 506, participle dictionary and deactivated dictionary for being stored according to Cloud Server store Cloud Server
Publicly-owned short message training set and/or the corresponding privately owned short message training set of mobile terminal learnt with obtain it is corresponding with mobile terminal
Classified lexicon;It is additionally operable to after publicly-owned short message training set and/or the update of privately owned short message training set, to publicly-owned short message training set
And/or privately owned short message training set is learnt to obtain Word library updating information, and then make mobile terminal according to Word library updating information
The classified lexicon stored in synchronized update mobile terminal;Specific implementation can refer to the corresponding realization processes of abovementioned steps S201,
This is no longer repeated.
Cloud Server update module 507, the classification error information for receiving mobile terminal upload;Being additionally operable to will be pending
Short message is added in the corresponding privately owned short message training set of mobile terminal to update privately owned short message training set, and publicly-owned black for updating
White list and/or privately owned black and white lists are to obtain publicly-owned black and white lists fresh information and/or privately owned black and white lists fresh information;Tool
Body realization can refer to the corresponding realization processes of abovementioned steps S202, no longer repeat herein.
Publicly-owned short message training set is for storing a certain number of classified refuse messages and non-junk short message, Cloud Server
The refuse messages of the corresponding word feature of short message and rule feature in publicly-owned short message training set in the publicly-owned short message training set obtained
In the matching frequency, in publicly-owned short message training set refuse messages quantity, word feature and rule feature in publicly-owned short message training set
Non-junk short message in the matching frequency, non-junk short message quantity can be stored in publicly-owned short message training set in publicly-owned short message training set
In, it can also be stored in other storage locations such as study module 506 of Cloud Server.Privately owned short message training set is mobile whole for storing
The classified refuse messages uploaded and non-junk short message are held, short message pair in the privately owned short message training set that similarly Cloud Server obtains
The information such as the matching frequency of the word feature and rule feature answered in privately owned short message training set can be stored in privately owned short message training
It concentrates, can also be stored in other storage locations such as study module 506 of Cloud Server.Classified lexicon is for storing Cloud Server pair
Privately owned short message training set corresponding with mobile terminal and publicly-owned short message training set are learnt and the word feature and rule that obtain
Matching probability of the feature in refuse messages and the matching probability in non-junk short message, the ratio shared by refuse messages and
Ratio shared by non-junk short message.Participle dictionary is for storing the corresponding each significant word feature of short message.Deactivate dictionary
Do not have contributive word feature to SMS classified for storing comprising the single word that is formed after participle, interjection, the tone help
Word, pronoun etc..Publicly-owned black and white lists are for storing blacklist is generally added in user refuse messages sender number and addition
Non-junk short message sending person's number of white list.Privately owned black and white lists are for the rubbish that blacklist is added corresponding with mobile terminal
Short message sending person number and non-junk short message sending person's number that white list is added.
Filtering short message system of the present invention is distributed frame, and the classification of mobile terminal execution short message judges, utilizes processing energy
Power executes classification compared with the faster Cloud Server of strong and processing speed and judges required learning process, can improve the mistake of refuse messages
Efficiency is filtered, makes the filtering of refuse messages that there is personalization.
Mode the above is only the implementation of the present invention is not intended to limit the scope of the invention, every to utilize this
Equivalent structure or equivalent flow shift made by description of the invention and accompanying drawing content, it is relevant to be applied directly or indirectly in other
Technical field is included within the scope of the present invention.
Claims (11)
1. a kind of method for filtering spam short messages, which is characterized in that including:
Mobile terminal classifies to obtain classification results to pending short message according to the classified lexicon of its storage, wherein described
Classification results are refuse messages or non-junk short message;
When the classification results are judged as the classification results of mistake and mobile terminal receives the classification knot of the corresponding mistake
When the uploading instructions of fruit, classification error information is uploaded to Cloud Server to update and the mobile terminal pair by the mobile terminal
The privately owned short message training set answered, wherein the classification error information includes the classification results of pending short message and mistake;
The mobile terminal obtains the classificating word that the Word library updating information of Cloud Server is stored with mobile terminal described in synchronized update
Library, wherein the Word library updating information be the privately owned short message training set corresponding with mobile terminal stored in Cloud Server and/or
Cloud Server learns privately owned short message training set and publicly-owned short message training set and is obtained, institute after publicly-owned short message training set update
Predicate library fresh information obtains when being updated at least one of the privately owned short message training set and the publicly-owned short message training set
.
2. according to the method described in claim 1, it is characterized in that, the mobile terminal is treated according to the classified lexicon of its storage
The step of processing short message is classified to obtain classification results specifically includes:
The mobile terminal pre-processes pending short message to obtain the corresponding word feature of pending short message and rule
Feature;
The mobile terminal is by the ratio P (C shared by the refuse messages stored in classified lexicon1), the ratio shared by non-junk short message
P(C2), the matching probability P (x of word feature and rule feature in refuse messagesk|C1) and non-junk short message in matching
Probability P (xk|C2) substitute into Bayes's classification formula, to obtain the probability P (C that the pending short message belongs to refuse messages1|
X), the Bayes's classification formula is specific as follows shown:
The mobile terminal obtains the probability P (C that pending short message belongs to non-junk short message2| X), it is specific as follows shown:
P(C2| X)=1-P (C1|X)
The mobile terminal obtains the classification results of pending short message, wherein as P (C1|X)>P(C2| X) when it is then described pending
Short message belongs to refuse messages, and otherwise the pending short message belongs to non-junk short message.
3. according to the method described in claim 2, it is characterized in that,
Pending short message is pre-processed in the mobile terminal to obtain the corresponding word feature of pending short message and rule
Further include before the step of feature then:
Whether the mobile terminal judges sender's number of pending short message in privately owned black and white lists corresponding with mobile terminal
In, wherein when sender's number is in privately owned blacklist corresponding with mobile terminal, then the pending short message belongs to
Refuse messages, when sender's number is in privately owned white list corresponding with mobile terminal, then the pending short message belongs to
Non-junk short message;
When sender's number is not in privately owned black and white lists corresponding with mobile terminal, the mobile terminal continues to judge
Whether sender's number is in publicly-owned black and white lists, wherein then described when sender's number is in publicly-owned blacklist to wait for
Processing short message belongs to refuse messages, and when sender's number is in publicly-owned white list, then the pending short message belongs to non-rubbish
Rubbish short message;
When sender's number is not in publicly-owned black and white lists, pending short message is carried out described in the mobile terminal execution
The step of pretreatment is to obtain the corresponding word feature of pending short message and rule feature.
4. according to the method described in claim 3, it is characterized in that,
When the classification results are judged as the classification results of mistake and mobile terminal receives the classification knot of the corresponding mistake
When the uploading instructions of fruit, the classification error information that the mobile terminal uploads to Cloud Server further includes the transmission of pending short message
Sender's number is uploaded to Cloud Server to judge whether the sender's number cloud is added by person's number, the mobile terminal
In the privately owned black and white lists corresponding with mobile terminal of server storage and/or publicly-owned black and white lists;
When the privately owned black and white lists corresponding with mobile terminal of Cloud Server storage and/or publicly-owned black and white lists update,
The privately owned black and white lists fresh information and/or publicly-owned black and white lists fresh information of the mobile terminal acquisition Cloud Server are with synchronization
Update the publicly-owned black and white lists of mobile terminal storage and/or privately owned black and white lists.
5. method according to claim 1 or 4, which is characterized in that
The classification results of the mistake are that will belong to the pending of refuse messages SMS classified to be non-junk short message or will belong to
Non-junk short message it is pending it is SMS classified be refuse messages;
The Word library updating information includes at least the word feature and rule of pending short message after the update of privately owned short message training set
Shared by matching probability of the feature in refuse messages or non-junk short message, the ratio shared by refuse messages and non-junk short message
Ratio.
6. a kind of method for filtering spam short messages, which is characterized in that including:
The privately owned short message training set corresponding with mobile terminal and publicly-owned short message training set that Cloud Server stores it learn
To obtain corresponding with mobile terminal classified lexicon, the classified lexicon be used for mobile terminal to pending short message classify with
Obtain classification results, wherein the classification results are refuse messages or non-junk short message;
When the classification results are judged as the classification results of mistake and mobile terminal receives the classification knot of the corresponding mistake
When the uploading instructions of fruit, the classification error information of the cloud server mobile terminal upload, wherein the classification error letter
Breath includes the classification results of pending short message and mistake;
The Cloud Server pending short message is added privately owned short to update in privately owned short message training set corresponding with mobile terminal
Believe training set;
After the privately owned short message training set and/or the update of publicly-owned short message training set, the Cloud Server trains privately owned short message
Collection and publicly-owned short message training set are learnt to obtain Word library updating information, and the Word library updating information is the privately owned short message instruction
It is obtained when practicing the update of at least one of collection and described publicly-owned short message training set.
7. according to the method described in claim 6, it is characterized in that,
The classification results of the mistake are that will belong to the pending of refuse messages SMS classified to be non-junk short message or will belong to
Non-junk short message it is pending it is SMS classified be refuse messages;
When the classification results of the mistake be by belong to refuse messages it is pending it is SMS classified be non-junk short message when, described
After privately owned short message training set update, the Cloud Server learns to obtain privately owned short message training set and publicly-owned short message training set
The step of obtaining Word library updating information specifically includes:
The Cloud Server pre-processes pending short message to obtain the corresponding word feature of pending short message and rule
Feature;
The Cloud Server is according to the matching of word feature and rule feature in refuse messages described in publicly-owned short message training set
The matching frequency in refuse messages of word feature and rule feature described in the frequency, privately owned training set, privately owned short message training set
The first Word library updating information is obtained with refuse messages quantity, the non-junk short message quantity in publicly-owned short message training set, wherein described
First Word library updating information include after the update of privately owned short message training set the word feature of pending short message and rule feature in rubbish
The ratio shared by matching probability, refuse messages in rubbish short message and the ratio shared by non-junk short message;
When the classification results of the mistake be by belong to non-junk short message it is pending it is SMS classified be refuse messages when, described
After privately owned short message training set update, the Cloud Server learns to obtain privately owned short message training set and publicly-owned short message training set
The step of obtaining Word library updating information specifically includes:
The Cloud Server pre-processes pending short message to obtain the corresponding word feature of pending short message and rule
Feature;
The Cloud Server is according to of word feature and rule feature in non-junk short message described in publicly-owned short message training set
The matching frequency, privately owned short message instruction with word feature and rule feature described in the frequency, privately owned training set in non-junk short message
Practice the refuse messages quantity in collection and publicly-owned short message training set, non-junk short message quantity obtains the second Word library updating information, wherein
The second Word library updating information includes the word feature of pending short message and rule feature after privately owned short message training set updates
The ratio shared by matching probability, refuse messages in non-junk short message and the ratio shared by non-junk short message.
8. the method according to the description of claim 7 is characterized in that
The classification error information further includes sender's number of pending short message, and the Cloud Server judges whether sender
Number is added in privately owned black and white lists corresponding with mobile terminal and/or the publicly-owned black and white lists of Cloud Server storage, if then
The Cloud Server updates privately owned black and white lists corresponding with mobile terminal and/or publicly-owned black and white lists to obtain privately owned black and white name
Single fresh information and/or publicly-owned black and white lists fresh information, so that the public affairs of mobile terminal synchronization update mobile terminal storage
There are black and white lists and/or privately owned black and white lists.
9. a kind of mobile terminal, which is characterized in that including:
Sort module, the classified lexicon for being stored according to mobile terminal classify to pending short message to obtain classification knot
Fruit, wherein the classification results be refuse messages or non-junk short message, the classified lexicon be Cloud Server it is stored with
The corresponding privately owned short message training set of mobile terminal and publicly-owned short message training set are learnt and are obtained;
Uploading module, for be judged as the classification results of mistake when the classification results and mobile terminal receive it is corresponding described in
When the uploading instructions of the classification results of mistake, it is corresponding with mobile terminal to update that classification error information is uploaded into Cloud Server
Privately owned short message training set, wherein the classification error information includes the classification results of pending short message and mistake;
Mobile terminal to update module, for obtaining the Word library updating information of Cloud Server to be deposited in mobile terminal described in synchronized update
The classified lexicon of storage, wherein the Word library updating information is after privately owned short message training set and/or the update of publicly-owned short message training set
Cloud Server learns privately owned short message training set and publicly-owned short message training set and is obtained, and the Word library updating information is described
It is obtained when at least one of privately owned short message training set and the publicly-owned short message training set update.
10. a kind of Cloud Server, which is characterized in that including:
Study module, the privately owned short message training set corresponding with mobile terminal for being stored to Cloud Server and the training of publicly-owned short message
Collection is learnt to obtain classified lexicon corresponding with mobile terminal, and the classified lexicon is for mobile terminal to pending short message
Classify to obtain classification results, wherein the classification results are refuse messages or non-junk short message;
Cloud Server update module, when the classification results be judged as mistake classification results and mobile terminal receive correspondence
When the uploading instructions of the classification results of the mistake, the classification error information for receiving mobile terminal upload, wherein described point
Class error message includes the classification results of pending short message and mistake;
The Cloud Server update module is additionally operable to the pending short message privately owned short message training set corresponding with mobile terminal is added
In to update privately owned short message training set;
The study module is additionally operable to after privately owned short message training set and/or the update of publicly-owned short message training set, is instructed to privately owned short message
Practice collection and publicly-owned short message training set is learnt to obtain Word library updating information, and then makes the mobile terminal according to Word library updating
The classified lexicon stored in synchronizing information update mobile terminal, the Word library updating information is the privately owned short message training set and institute
It is obtained when stating the update of at least one of publicly-owned short message training set.
11. a kind of filtering junk short messages system, which is characterized in that including:Mobile terminal as claimed in claim 9 and such as power
Profit requires the Cloud Server described in 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310279728.8A CN104284306B (en) | 2013-07-04 | 2013-07-04 | A kind of method for filtering spam short messages, system, mobile terminal and Cloud Server |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310279728.8A CN104284306B (en) | 2013-07-04 | 2013-07-04 | A kind of method for filtering spam short messages, system, mobile terminal and Cloud Server |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104284306A CN104284306A (en) | 2015-01-14 |
CN104284306B true CN104284306B (en) | 2018-07-24 |
Family
ID=52258688
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310279728.8A Active CN104284306B (en) | 2013-07-04 | 2013-07-04 | A kind of method for filtering spam short messages, system, mobile terminal and Cloud Server |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104284306B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104967981A (en) * | 2015-07-06 | 2015-10-07 | 王小安 | Crank call and text message blocking method |
CN105162984B (en) * | 2015-09-23 | 2018-11-23 | 小米科技有限责任公司 | Telephone number recognition methods and device |
CN105307176B (en) * | 2015-11-10 | 2019-03-08 | 中国科学院信息工程研究所 | Robustness message routing method in a kind of mobile social opportunistic network |
CN110019773A (en) * | 2017-08-14 | 2019-07-16 | 中国移动通信有限公司研究院 | A kind of refuse messages detection method, terminal and computer readable storage medium |
CN107517452A (en) * | 2017-09-04 | 2017-12-26 | 上海连尚网络科技有限公司 | A kind of method, equipment and computer-readable storage medium for being used to manage short message |
CN112597282B (en) * | 2021-01-24 | 2021-06-11 | 深圳市诚立业科技发展有限公司 | Management method applied to short message data security |
CN115065972B (en) * | 2022-06-09 | 2024-01-12 | 通华大数据科技(烟台)有限公司 | Junk information clearing system based on communication big data |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101877837A (en) * | 2009-04-30 | 2010-11-03 | 华为技术有限公司 | Method and device for short message filtration |
CN102547623A (en) * | 2010-12-08 | 2012-07-04 | 中国电信股份有限公司 | Junk short message processing method and system |
-
2013
- 2013-07-04 CN CN201310279728.8A patent/CN104284306B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101877837A (en) * | 2009-04-30 | 2010-11-03 | 华为技术有限公司 | Method and device for short message filtration |
CN102547623A (en) * | 2010-12-08 | 2012-07-04 | 中国电信股份有限公司 | Junk short message processing method and system |
Also Published As
Publication number | Publication date |
---|---|
CN104284306A (en) | 2015-01-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104284306B (en) | A kind of method for filtering spam short messages, system, mobile terminal and Cloud Server | |
CN110287479A (en) | Name entity recognition method, electronic device and storage medium | |
CN109344291A (en) | A kind of video generation method and device | |
CN103500195B (en) | Grader update method, device, system and equipment | |
CN102073704B (en) | Text classification processing method, system and equipment | |
CN104462064A (en) | Method and system for prompting content input in information communication of mobile terminals | |
CN103533152A (en) | Short message processing method and system of mobile terminal | |
CN112579733B (en) | Rule matching method, rule matching device, storage medium and electronic equipment | |
CN112989800A (en) | Multi-intention identification method and device based on Bert sections and readable storage medium | |
CN106202053A (en) | A kind of microblogging theme sentiment analysis method that social networks drives | |
CN110309339A (en) | Picture tag generation method and device, terminal and storage medium | |
CN110209810A (en) | Similar Text recognition methods and device | |
CN109471932A (en) | Rumour detection method, system and storage medium based on learning model | |
CN102646124A (en) | Method for automatically identifying address information | |
CN106649338B (en) | Information filtering strategy generation method and device | |
CN108334353B (en) | Skill development system and method | |
US10217455B2 (en) | Linguistic model database for linguistic recognition, linguistic recognition device and linguistic recognition method, and linguistic recognition system | |
CN103607515A (en) | Short message merging device and method | |
CN113626624B (en) | Resource identification method and related device | |
CN114491149A (en) | Information processing method and apparatus, electronic device, storage medium, and program product | |
CN109783807A (en) | A kind of user comment method for digging for APP software defect | |
CN117332062A (en) | Data processing method and related device | |
CN110175288B (en) | Method and system for filtering character and image data for teenager group | |
CN112487817A (en) | Named entity recognition model training method, sample labeling method, device and equipment | |
CN102238098B (en) | A kind of information synthesis method and the terminal of correspondence and instant communicating system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |