CN103607705B - Method for filtering spam short messages and engine - Google Patents

Method for filtering spam short messages and engine Download PDF

Info

Publication number
CN103607705B
CN103607705B CN201310646010.8A CN201310646010A CN103607705B CN 103607705 B CN103607705 B CN 103607705B CN 201310646010 A CN201310646010 A CN 201310646010A CN 103607705 B CN103607705 B CN 103607705B
Authority
CN
China
Prior art keywords
url
telephone number
credit rating
note
storehouse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310646010.8A
Other languages
Chinese (zh)
Other versions
CN103607705A (en
Inventor
史领航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Netqin Technology Co Ltd
Original Assignee
Beijing Netqin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Netqin Technology Co Ltd filed Critical Beijing Netqin Technology Co Ltd
Priority to CN201310646010.8A priority Critical patent/CN103607705B/en
Publication of CN103607705A publication Critical patent/CN103607705A/en
Application granted granted Critical
Publication of CN103607705B publication Critical patent/CN103607705B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a kind of method for filtering spam short messages and filtering junk short messages engine.Described method may include that the telephone number and/or URL (URL) extracted in short message content;The telephone number extracted and/or the credit rating of URL is retrieved from the credit rating storehouse of telephone number and/or URL;And, depending at least on described telephone number and/or the credit rating of URL, it is judged that whether described note is refuse messages;Wherein, the credit rating storehouse of described telephone number and/or URL is to determine according to the note sample set gathered.Method according to embodiments of the present invention, it is possible to increase that those regular commercial note numbers using operators to provide are sent, comprise malicious call and/or the filter efficiency of the refuse messages of malice network address.

Description

Method for filtering spam short messages and engine
Technical field
The present invention relates to moving communicating field, more particularly, to filter method and the equipment of refuse messages.
Background technology
In recent years, along with the popularity rate of mobile phone is more and more higher, and the cost of note is more and more lower, utilizes note The behavior realize marketing, even implementing to swindle gets more and more.These normal users are not intended to that receive or unrelated with user Substantial amounts of note is referred to as refuse messages.According to statistics, there are about 35% mobile phone user short by crossing rubbish to some extent The harassing and wrecking of letter.The refuse messages that each user the most monthly receives is about 8.According in by the end of March, 2013 China Mobile's phone use Family sum has reached the statistics of 11.46 hundred million and has come conservative estimation, the refuse messages total amount that China mobile phone user receives average every day Will be more than 300,000,000.Refuse messages has become as a serious social problem.
In order to avoid the harassing and wrecking of refuse messages, the user of intelligent mobile phone is generally selected installation and has filtering junk short messages Harassing and wrecking avoided by engine.At present, rubbish filtering engine on the market mainly belongs to blacklist or white list according to sender number Identify refuse messages.But, increasing refuse messages sender may use and transport from China Mobile, UNICOM, telecommunications etc. Battalion business at buy offers note Batch sending service commercial note number (such as, with area code such as 021,075 start number Code, or the number with 106 beginnings) send note, and real their telephone number of contact is placed on short message content In.Therefore, according to the mechanism of the black and white lists of existing sender number, easily fail to report these commercial note number conducts of use Refuse messages in the note of sender.If these commercial note numbers are both configured to blacklist by user, the most again may User wishes that the recommendation activity note wrong report of the real businessman received is refuse messages.
Accordingly, it would be desirable to the mechanism of the filtration note of a kind of improvement, it can reduce failing to report and reporting by mistake of refuse messages.
Summary of the invention
In order to realize this purpose, the method for filtering spam short messages and the filtering junk short messages that the invention provides a kind of improvement draw Holding up, it not only considers number of sender, and considers the prestige level of telephone number and/or the URL comprised in short message content Not.The method according to the invention, it is possible to increase that those regular commercial note numbers using operators to provide are sent, comprise The filter efficiency of the refuse messages of malicious call and/or malice network address.
According to an aspect of the invention, it is provided a kind of method for filtering spam short messages.The method may include that extraction is short Telephone number in letter content and/or URL (URL);Examine from the credit rating storehouse of telephone number and/or URL Telephone number that rope is extracted and/or the credit rating of URL;And, depending at least on described telephone number and/or the prestige of URL Rank, it is judged that whether described note is refuse messages;Wherein, the credit rating storehouse of described telephone number and/or URL is according to adopting The note sample set of collection determines.
In some embodiments of the invention, it is judged that whether note is refuse messages at least also foundation: the sender of note Credit rating, keyword match result and/or strategy based on semanteme.
In some embodiments of the invention, it is judged that whether note is that refuse messages may further include: calculate described The suspicious degree of note;And if the suspicious degree of described note is more than threshold value, then judge that described note is refuse messages.
Preferably, the suspicious degree calculating described note performs according to the following equation:
s = W × Σ i ( l i / L i ) + Σ j v j
Wherein, s represents the suspicious degree of note, liRepresenting the telephone number or the credit rating of URL retrieved, L represents letter The level arranged in reputation rank storehouse keeps count of, wherein i=1 ... N, N represent that extracted and retrieve the phone number of credit rating Code and the sum of URL, N is greater than null integer, and W is and the suspicious call number in short message content and/or the prestige of URL The factor that rank is correlated with weight in whole filtration system, vjRepresent the tribute being used for judging other suspicious factors of refuse messages Offer value, wherein j=1 ... M, M represent the number of other factors described, and it is greater than the integer equal to 1.
In some embodiments of the invention, described method can also include: note sample set and electricity described in regular update Words number and/or the credit rating storehouse of URL.
According to the second aspect of the invention, it is provided that a kind of filtering junk short messages engine.This filtering junk short messages engine can To include: extraction unit, it is configured that the telephone number and/or URL (URL) extracted in short message content;Retrieval Unit, is configured that from the credit rating storehouse of telephone number and/or URL telephone number and/or the letter of URL that retrieval extracted Reputation rank;And, it is judged that unit, it is configured that the credit rating depending at least on described telephone number and/or URL, it is judged that described short Whether letter is refuse messages, and wherein, the credit rating storehouse of described telephone number and/or URL determines according to note sample set.
In some embodiments of the invention, described judging unit judges whether described note is that refuse messages the most also depends on According to: sender's credit rating of described note, keyword match result and/or strategy based on semanteme.
In some embodiments of the invention, described judging unit farther includes: computation subunit, calculates described note Suspicious degree.Described judging unit is configured to, if the suspicious degree of described note is more than threshold value, then judges that described note is rubbish Note.
In some embodiments of the invention, described computation subunit is configurable to: calculate described according to the following equation The suspicious degree of note:
s = W × Σ i ( l i / L i ) + Σ j v j
Wherein, s represents the suspicious degree of note, liRepresenting the telephone number or the credit rating of URL retrieved, L represents letter The level arranged in reputation rank storehouse keeps count of, wherein i=1 ... N, N represent that extracted and retrieve the phone number of credit rating Code and the sum of URL, N is greater than null integer, and W is and the suspicious call number in short message content and/or the prestige of URL The factor that rank is correlated with weight in whole filtration system, vjRepresent the tribute being used for judging other suspicious factors of refuse messages Offer value, wherein j=1 ... M, M represent the number of other factors described, and it is greater than the integer equal to 1.
The most described filtering junk short messages engine can also include: updating block, is configured that The credit rating storehouse of note sample set and telephone number and/or URL described in regular update.
Accompanying drawing explanation
By the preferred embodiments of the present invention are described below in conjunction with the accompanying drawings, the above and other purpose of the present invention, spy will be made Advantage of seeking peace is clearer, wherein:
Fig. 1 schematically shows the applied field of the mobile communication system 100 that can use the embodiment of the present invention wherein The schematic diagram of scape;
Fig. 2 schematically show according to embodiments of the present invention build based on the telephone number in short message content and/or The flow chart of the method for the prestige hierarchy system of URL;
Fig. 3 schematically shows the flow chart of method for filtering spam short messages according to embodiments of the present invention;And
Fig. 4 diagrammatically illustrates the block diagram of filtering junk short messages engine according to embodiments of the present invention.
In all the drawings in the present invention, same or analogous structure is all marked with same or analogous reference Know.
Detailed description of the invention
The present invention is described in detail referring now to accompanying drawing, shown in the drawings of the illustrative embodiment of the present invention, so that Obtain those skilled in the art and be capable of the present invention.It is noted that the following drawings and example do not mean that the scope of the present invention It is limited to single embodiment, forms it by the described or shown elements of exchange and some or all combining different embodiments on the contrary His embodiment is also possible.Additionally, in the element-specific that known tip assemblies can be used partially or completely to realize the present invention In the case of, will only describe in order to understand part assembly essential to the invention in these known tip assemblies, and by omission to these The detailed description of other parts in known tip assemblies, so that the present invention is more prominent.Unless otherwise noted herein, otherwise this area Skilled artisans appreciated that: although some embodiments of the present invention are described as with software real form existing, but the present invention is unrestricted In this, but can also realize with the combination of hardware, software and hardware, and vice versa.Unless the clearest and the most definite sound Bright, the most in this manual, should the embodiment that show single component be considered as restrictive, but the invention is intended to Comprise other embodiments including multiple same components, and vice versa.Additionally, the present invention comprises herein as signal The equivalent of the current and future exploitation of cited known tip assemblies.
As it was previously stated, at present, increasing refuse messages sender may use from China Mobile, UNICOM, telecommunications etc. The commercial note number of the offer note Batch sending service of legal purchase at operator (such as, starts with area code such as 021,075 Number, or the number with 106 beginnings) send note, and real their telephone number of contact is placed on note In content.Therefore, according to the mechanism of the black and white lists of existing sender number, easily fail to report these commercial note numbers of use As the refuse messages in the note of sender.If these commercial note numbers are both configured to blacklist by user, the most again may be used User being wished, the recommendation activity note of the real businessman received is reported by mistake as refuse messages.In order to improve filtering junk short messages Efficiency, reduces and fails to report and report by mistake, and the present invention proposes to further contemplate in addition to considering other factors in filtering junk short messages mechanism The telephone number comprised in short message content and/or the credit rating of URL.
Fig. 1 shows the schematic diagram of the application scenarios of the mobile communication system 100 according to the present invention.As it is shown in figure 1, move Dynamic communication system 100 can include mobile terminal 120 and server 110.As example, show four mobile terminals in the drawings 120-1,120-2,120-3 and 120-4.However, it should be understood that system 100 can include more or less of mobile terminal.Move Dynamic terminal 120 is connected with server 110 by communication network 130.The example of communication network 130 can include but not limited to: mutually Networking, mobile communications network.
Server 110 is typically by the application vendor maintenance and management providing filtering junk short messages engine.Application business men The note of user's report can be gathered to generate note sample set by server 110, and note sample set be processed, To improve the filter effect of filtering junk short messages engine, reduce wrong report and fail to report (being described in detail below with reference to Fig. 2).To the greatest extent Pipe figure only illustrates a server 110, it should be appreciated that two or more server 110 can be there is.Also should manage Solving, server 110 can be single physical entity, it is also possible to be distributed on two or more physical entities.
Mobile terminal 120 can be any mobile terminal that can send and receive note.Can pacify on mobile terminal 120 Dress filtering junk short messages engine according to embodiments of the present invention.Note not only includes SMS message in the present invention, also includes coloured silk Letter.When the user discover that filtering junk short messages engine is failed to report or reported note by mistake, it is possible to use its mobile terminal 120 produces to application Business carries out reporting the note that this is failed to report or reports by mistake.It should be understood that the invention is not limited in each involved mobile terminal Concrete communication protocol, can include but not limited to 2G, 3G, 4G, 5G wireless communication technology, WCDMA, CDMA2000, TD-SCDMA Wireless technology etc..Different mobile terminals can use identical communication protocol, it would however also be possible to employ different communication protocol.This The bright specific operating system being also not limited to mobile terminal, can include but not limited to Android, iOS, Windows Mobile, Symbian, Windows Phone, Blackberry OS etc..Different mobile terminals can use identical operation System, it would however also be possible to employ different operating system.
Server 110 and mobile terminal 120 can be communicated by various wireless communication protocols, including 2G, 3G, 4G, 5G network, WCDMA, CDMA2000, TD-SCDMA system, wireless lan (wlan), etc..
The present invention proposes to build the letter based on the telephone number in short message content and/or URL by collection note sample Reputation hierarchy system (the also referred to as credit rating storehouse of telephone number and/or URL).It is described below with reference to Fig. 2.
Fig. 2 schematically show according to embodiments of the present invention build based on the telephone number in short message content and/or The flow chart of the method 200 of the prestige hierarchy system of URL.
Method 200 starts from step S210, gathers note sample.Can be reported by client user or user is voluntary The mode uploaded to server (such as, server 110) is to carry out note sample collection.
In step S220, the sample collected is classified, obtain refuse messages sample set and normal note sample Collection.
In step S230, from two sample sets, extract the telephone number/URL in short message content respectively, obtain phone Number/url list.
In step S240, for each the telephone number/URL in the telephone number/url list obtained, add up it The number of times occurred in normal note and/or the number of times occurred in refuse messages.
In step s 250, the credit worthiness of telephone number/URL is calculated according to statistical data.Preferably, in order to simplify calculating And storage, it is also possible to obtain credit worthiness rank according to credit worthiness.
In one exemplary embodiment, credit worthiness can be calculated according to following formula:
R=ts/th+ts,
Wherein r represents the credit worthiness of telephone number/URL, and th represents what this telephone number/URL occurred in normal note Number of times, the number of times of ts representative telephone number/URL appearance in refuse messages.Wherein, the value of r the lowest show this telephone number/ The credit worthiness of URL is the best.After obtaining credit worthiness, in order to simplify calculating and storage, it is also possible to credit worthiness is carried out classification.Example As, prestige can be divided into Pyatyi.As r=0, defining this telephone number/URL reputation is 0 grade, and prestige is best.When 0 < r≤ When 0.33, defining this telephone number/URL reputation is 1 grade, and prestige is preferable.When 0.33 < during r≤0.67, define this telephone number/ URL reputation is 2 grades, and prestige is general.When 0.67 < r, < when 1, defining this telephone number/URL reputation is 3 grades, and prestige is poor.Work as r=1 Time, defining this telephone number/URL reputation is 4 grades, and prestige is worst.Thus it is possible to obtain the credit rating of telephone number/URL Storehouse.It should be understood that the calculation of above-mentioned credit worthiness and prestige classification are only illustrative rather than limitation of the present invention.
In other embodiments, credit worthiness can be calculated by other formula.For example, it is possible to come based on r=th/th+ts Calculating credit worthiness, wherein, the implication of r, th and ts is as above.Easy to understand, in this case, the value of r is the highest, shows that prestige is more Good.Additionally, credit worthiness can be divided into more less than Pyatyi or more rank, such as, it is divided into 3 grades (good, in, poor), or is divided into 6 Level etc..
Obtain each telephone number/URL's in the telephone number/url list in step S240 in step s 250 After credit rating, generating the credit rating storehouse of telephone number and/or URL at server end, then method 200 terminates.Should Understand, in embodiments of the present invention, can the credit worthiness of direct storage and maintenance telephone number/URL after obtaining credit worthiness Do not calculate its credit rating.In other words, credit worthiness can be considered as the credit rating of unlimited classification.Hereinafter, by Primary Reference Credit rating is described the embodiment of the present invention, unless otherwise expressing, does not repartition credit worthiness and credit rating.
Server, can be by this prestige level after the credit rating storehouse of telephone number and/or URL generated as discussed above Other storehouse is sent to the mobile terminal filtering junk short messages engine for end.Then, portion can be comprised at mobile terminal Point or the copy in credit rating storehouse of whole telephone numbers and/or URL.
Preferably, method 200 can also include optionally updating step.In this renewal step, regular update note sample This collection (includes normal note sample set and refuse messages sample set), and the prestige level of the most more new phone number and/or URL Other storehouse.More new message sample can be carried out by performing the note sample collection similar with step S210 and S220 and sort operation Collection.Renewal to credit rating storehouse may include that the telephone number/URL updated in this credit rating storehouse;And, update this letter The credit rating of the telephone number/URL in reputation rank storehouse.When the note sample gathered is abundant, telephone number and/or URL Credit rating storehouse will tend to accurate.
Correspondingly, during the credit rating storehouse of server update telephone number and/or URL, can should to mobile terminal notice Update or send update content so that the rubbish that telephone number after renewal and/or the prestige level of URL can be used for mobile terminal is short Letter filter engine.
It should be understood that arbitrary in the telephone number that can consider in an embodiment of the present invention in short message content and URL Individual or consider the two factor simultaneously, correspondingly need to generate the credit rating of in telephone number and/or URL corresponding Storehouse or both credit rating storehouses.When generating the credit rating storehouse of both telephone number and URL, two can be generated respectively Individually storehouse, or comprise the comprehensive storehouse of the two.The present invention is unrestricted in this regard.
The filtering junk short messages mechanism of the present invention is described from the angle of mobile terminal below with reference to Fig. 2 and Fig. 3.
Fig. 3 schematically shows the flow chart of method for filtering spam short messages 300 according to embodiments of the present invention.
When mobile terminal receives certain note, the filtering junk short messages engine that mobile terminal is installed will perform method 300 to judge whether this note is refuse messages.
In step S310, extract the telephone number in short message content and/or uniform resource position mark URL.It is assumed that this is short Letter content comprises telephone number/URL(such as, telephone number/URL X).Then, step S310 will be extracted phone number Code and/or URL X.It should be understood that short message content may both comprise telephone number also comprise URL, and potentially include more than One telephone number and more than one URL.
In step s 320, from the credit rating storehouse of telephone number and/or URL the telephone number that extracted of retrieval and/ Or URL(is such as, X) credit rating.As it was previously stated, the credit rating storehouse of telephone number and/or URL is that server is according to collection Note sample set determine.(server end generates) telephone number can be comprised partly or completely at mobile terminal And/or the copy in the credit rating storehouse of URL.In step s 320, filtering junk short messages engine can first from telephone number and/or The credit rating of telephone number/URL X that retrieval is extracted in the local replica in the credit rating storehouse of URL.If do not had in this locality Have the credit rating retrieving X, then can to the telephone number/URL(of server end such as, X) credit rating library searching electricity The credit rating of words number/URL X.If for the telephone number/URL extracted in step S310, the most do not retrieve its letter Reputation rank, then can process this note in a conventional manner, and method 300 terminates.Carry if retrieved in step S310 All or part of telephone number in the telephone number taken and/or URL and/or the credit rating of URL, then method 300 is advanced To step S330.
In step S330, depending at least on the telephone number retrieved and/or the credit rating of URL X, it is judged that described short Whether letter is refuse messages.In this step, can ignore and do not retrieve telephone number and/or the URL of credit rating.Should Understand, when judging refuse messages, in addition to the credit rating considering telephone number and/or the URL comprised in short message content, Can also be according to other various factors, sender's credit rating of the note as having been contemplated that in prior art, keyword match Result and/or strategy based on semanteme etc..
In of step S330 implements, it is judged that whether the note received is that refuse messages may include that meter Calculate the suspicious degree of this note;And, if the suspicious degree of this note is more than threshold value, then judge that this note is refuse messages.If Judge that this note is refuse messages, then this note is intercepted, be deposited in refuse bin;If it is determined that this note is not Refuse messages, then carry out operation of letting pass, make this note occur in short message inbox this note.
Consider that a specific implementation embodiment describes the suspicious degree how calculating note, in this embodiment in detail below The prestige of telephone number/URL is divided in credit rating storehouse L level, and wherein prestige is from preferably corresponding respectively to 0 to L-1 to worst Level.
In this embodiment it is possible to calculate the suspicious degree of note according to formula (1):
s = W &times; &Sigma; i ( l i / L i ) + &Sigma; j v j - - - ( 1 )
Wherein, s represents the suspicious degree of note, liRepresenting the telephone number or the credit rating of URL retrieved, L represents letter The level arranged in reputation rank storehouse keeps count of, wherein i=1 ... N, N represent that extracted and retrieve the phone number of credit rating Code and the sum of URL, N is greater than null integer, and W is and the suspicious call number in short message content and/or the prestige of URL The factor that rank is correlated with weight in whole filtration system, vjRepresent for judge refuse messages other suspicious factors (as Sender's credit rating of note, keyword match result and/or based on semantic strategy etc.) contribution margin, wherein j= 1 ... M, M represent the number of other factors described, and it is greater than the integer equal to 1.
It should be understood that the present invention is not limited to calculate the suspicious degree of note according to above-mentioned formula (1).Alternately, same Sample is with reference to above-mentioned specific embodiment, it is also possible to calculate the suspicious degree of note according to following formula (2):
s = W &times; Max i ( l i / L ) + &Sigma; j v j - - - ( 2 )
Wherein, s represents the suspicious degree of note, liRepresenting the telephone number or the credit rating of URL extracted, L represents letter The level arranged in reputation rank storehouse keeps count of,Represent and take maximum, wherein i=1 ... N, N represent extracted telephone number With the sum of URL, N is greater than null integer, and W is the prestige level with the suspicious call number in short message content and/or URL Not Xiang Guan factor weight in whole filtration system, vjRepresent the contribution being used for judging other suspicious factors of refuse messages Value, wherein j=1 ... M, M represent the number of other factors described, and it is greater than the integer equal to 1.
In an alternative realization of step S330, it is judged that whether the note received is that refuse messages may include that meter Calculate the credibility of this note;And, if this note is with a low credibility in threshold value, then judge that this note is refuse messages.
Refer again to above-mentioned specific embodiment, can be according to the credibility of formula (3) calculating note:
r = R - W &times; &Sigma; i ( l i / L ) - &Sigma; j v j - - - ( 3 )
Wherein, r represents the credibility of note, and R is default credibility reference value, liRepresent the telephone number that extracted or The credit rating of URL, L represents that the level arranged in credit rating storehouse keeps count of, wherein i=1 ... N, N represent extracted phone Number and the sum of URL, N is greater than null integer, and W is and the suspicious call number in short message content and/or the letter of URL The factor that reputation rank is correlated with weight in whole filtration system, vjExpression is for judging other suspicious factors of refuse messages Contribution margin, wherein j=1 ... M, M represent the number of other factors described, and it is greater than the integer equal to 1.
It is alternatively possible to calculate the credibility of note according to formula (4):
r = R - W &times; Max i ( l i / L ) + &Sigma; j v j - - - ( 3 )
Wherein, r represents the credibility of note, and R is default credibility reference value, liRepresent the telephone number that extracted or The credit rating of URL, L represents that the level arranged in credit rating storehouse keeps count of,Represent and take maximum, wherein i=1 ... N, N represents the sum of extracted telephone number and URL, and N is greater than null integer, and W is and the suspicious electricity in short message content The factor that the credit rating of words number and/or URL is correlated with weight in whole filtration system, vjRepresent and be used for judging that rubbish is short The contribution margin of other suspicious factors of letter, wherein j=1 ... M, M represent the number of other factors described, and it is greater than equal to 1 Integer.
Method for filtering spam short messages according to embodiments of the present invention is described above by reference to concrete example.But, should This understanding, the invention is not restricted to above-mentioned implementing.Such as, the feelings in different credit rating storehouses are used at telephone number with URL Under condition, easily above-mentioned formula (1) can be revised as following formula (5), realize another embodiment.
s = W 1 &times; &Sigma; i 1 ( l i 1 / L 1 ) + W 2 &times; &Sigma; i 2 ( l i 2 / L 2 ) + &Sigma; j v j - - - ( 5 )
Wherein, s represents the suspicious degree of note, li1And li2Represent the prestige level of the telephone number retrieved and URL respectively Not, L1 and L2 represents that the level arranged in telephone number credit rating storehouse and URL reputation rank storehouse keeps count of respectively, wherein i1= 1 ... N1, N1 represent that extracted and retrieve the sum of telephone number of credit rating, i2=1 ... N2, N2 represent and are carried That take and retrieve the sum of URL of credit rating, N1 and N2 is greater than null integer respectively, W be with in short message content Suspicious call number and/or the credit rating of URL relevant factor weight in whole filtration system, vjRepresent and be used for sentencing Other suspicious factors of disconnected refuse messages are (such as sender's credit rating of note, keyword match result and/or based on semanteme Strategy etc.) contribution margin, wherein j=1 ... M, M represent the number of other factors described, and it is whole that it is greater than equal to 1 Number.
It is likewise possible to formula (2)-(4) to be carried out similar amendment, the letter different with URL use to adapt to telephone number The situation in reputation rank storehouse.
Fig. 4 diagrammatically illustrates the block diagram of filtering junk short messages engine 400 according to embodiments of the present invention.This refuse messages Filter engine can be installed on mobile terminals as client.As it can be seen, filtering junk short messages engine 400 can include carrying Take unit 410, retrieval unit 420, judging unit 430 and memory element 440.
Extraction unit 410 is configurable to: extract the telephone number in short message content and/or URL (URL).
Retrieval unit 420 is configurable to: retrieve the electricity extracted from the credit rating storehouse of telephone number and/or URL Words number and/or the credit rating of URL.
Judging unit 430 is configurable to: depending at least on described telephone number and/or the credit rating of URL, it is judged that described Whether note is refuse messages.Described judging unit judges that the Consideration of refuse messages the most also includes: the sender of note Credit rating, keyword match result and/or strategy based on semanteme etc..
This judging unit 430 may further include: computation subunit, calculates the suspicious degree of described note, if wherein The suspicious degree of described note is more than threshold value, then judge that described note is refuse messages.Such as, computation subunit is configurable to: The suspicious degree of described note is calculated according to above-mentioned formula (1), (2), (5) or other formula.
Alternatively, this judging unit 430 may further include: computation subunit, calculates the credibility of described note, its In, if the credibility of described note is less than threshold value, then judge that described note is refuse messages.Such as, computation subunit can be joined It is set to: according to above-mentioned formula (3), (4) or other formula to calculate the credibility of note.
Alternatively, filtering junk short messages engine 400 can also include updating block.Updating block is configurable to: periodically Update the credit rating storehouse of described note sample set and telephone number and/or URL.
Extraction unit 410, retrieval unit 420 and judging unit 430 can be separately implemented at above-mentioned steps S310,320 and 330 operations completed, do not repeat them here.
Memory element 440 can store the local replica in the credit rating storehouse of telephone number and/or URL, and it can include Server is according to all or part of of the credit rating storehouse of the telephone number that determines of note sample set gathered and/or URL.Can Selection of land, memory element 440 can also include other data, such as sender's credit rating (such as black and white lists), for coupling Keyword and/or strategy based on semanteme etc..Alternatively, memory element 440 can also store reception note content and Its relevant information.Alternatively, during memory element 440 can also be stored in filtering junk short messages use or generate other Data.Memory element 440 can be realized by one or more memorizeies, and it may be located on single physical equipment or distribution On different physical equipments.Memory element can be realized by various memory technologies well known by persons skilled in the art.This Bright the most unrestricted.Memory element 440 such as can include disk, magneto-optic disk, CD or semiconductor memory technologies Etc..
Already in connection with preferred embodiment, invention has been described above.It will be understood by those skilled in the art that above The method and apparatus illustrated is only exemplary.The method of the present invention is not limited to step illustrated above and order.This Bright mobile terminal and server can include parts more more or less of than the parts illustrated.Those skilled in the art are according to institute Show that the teaching of embodiment can carry out many and change and modifications.
The equipment of the present invention and parts thereof can by such as super large-scale integration or gate array, such as logic chip, The quasiconductor of transistor etc. or the programmable hardware device of such as field programmable gate array, programmable logic device etc. Hardware circuit realizes, it is also possible to realize with the software that performed by various types of processors, it is also possible to by above-mentioned hardware circuit and Being implemented in combination with of software.
The present invention can realize plurality of advantages.The present invention proposes based on telephone number in short message content/URL reputation rank storehouse Telephone number/URL is carried out classification, improves the whole structure of filtering junk short messages.
Although it should be appreciated by those skilled in the art that and describe the present invention by specific embodiment, but the model of the present invention Enclose and be not limited to these specific embodiments.The scope of the present invention is limited by claims and any equivalents thereof.

Claims (6)

1. a method for filtering spam short messages, including:
Extract the telephone number in short message content and/or URL (URL),
The telephone number extracted and/or the credit rating of URL is retrieved from the credit rating storehouse of telephone number and/or URL, with And
Depending at least on described telephone number and/or the credit rating of URL, it is judged that whether described note is refuse messages;
Wherein, the credit rating storehouse of described telephone number and/or URL is to determine according to the note sample set gathered,
Wherein judge whether described note is that refuse messages farther includes:
Calculate the suspicious degree of described note, and
If the suspicious degree of described note is more than threshold value, then judge that described note is refuse messages,
The suspicious degree wherein calculating described note performs according to the following equation:
s = W &times; &Sigma; i ( l i / L i ) + &Sigma; j v j
Wherein, s represents the suspicious degree of note, liRepresenting the telephone number or the credit rating of URL retrieved, L represents credit rating The level arranged in storehouse keeps count of, wherein i=1 ... N, N represent that extracted and retrieve telephone number and the URL of credit rating Sum, N is greater than null integer, and W is the credit rating phase with the suspicious call number in short message content and/or URL The factor closed weight in whole filtration system, vjRepresent the contribution margin being used for judging other suspicious factors of refuse messages, Wherein j=1 ... M, M represent the number of other factors described, and it is greater than the integer equal to 1.
Method the most according to claim 1, wherein judges whether described note is refuse messages at least also foundation: described short Sender's credit rating of letter, keyword match result and/or strategy based on semanteme.
Method the most according to claim 1, also includes: note sample set described in regular update and telephone number and/or The credit rating storehouse of URL.
4. a filtering junk short messages engine, including:
Extraction unit, is configured that the telephone number and/or URL (URL) extracted in short message content,
Retrieval unit, be configured that from the credit rating storehouse of telephone number and/or URL telephone number that retrieval extracted and/or The credit rating of URL,
Judging unit, is configured that the credit rating depending at least on described telephone number and/or URL, it is judged that whether described note is Refuse messages;
Wherein, the credit rating storehouse of described telephone number and/or URL determines according to note sample set,
Wherein said judging unit farther includes:
Computation subunit, calculates the suspicious degree of described note,
If the suspicious degree of the most described note is more than threshold value, then judge that described note is refuse messages,
Wherein said computation subunit is configured that the suspicious degree calculating described note according to the following equation:
s = W &times; &Sigma; i ( l i / L i ) + &Sigma; j v j
Wherein, s represents the suspicious degree of note, liRepresenting the telephone number or the credit rating of URL retrieved, L represents credit rating The level arranged in storehouse keeps count of, wherein i=1 ... N, N represent that extracted and retrieve telephone number and the URL of credit rating Sum, N is greater than null integer, and W is the credit rating phase with the suspicious call number in short message content and/or URL The factor closed weight in whole filtration system, vjRepresent the contribution margin being used for judging other suspicious factors of refuse messages, Wherein j=1 ... M, M represent the number of other factors described, and it is greater than the integer equal to 1.
Engine the most according to claim 4, wherein said judging unit judge described note be whether refuse messages at least Also foundation: sender's credit rating of described note, keyword match result and/or strategy based on semanteme.
Engine the most according to claim 4, also includes: updating block, be configured that note sample set described in regular update with And the credit rating storehouse of telephone number and/or URL.
CN201310646010.8A 2013-12-04 2013-12-04 Method for filtering spam short messages and engine Expired - Fee Related CN103607705B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310646010.8A CN103607705B (en) 2013-12-04 2013-12-04 Method for filtering spam short messages and engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310646010.8A CN103607705B (en) 2013-12-04 2013-12-04 Method for filtering spam short messages and engine

Publications (2)

Publication Number Publication Date
CN103607705A CN103607705A (en) 2014-02-26
CN103607705B true CN103607705B (en) 2016-09-21

Family

ID=50125901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310646010.8A Expired - Fee Related CN103607705B (en) 2013-12-04 2013-12-04 Method for filtering spam short messages and engine

Country Status (1)

Country Link
CN (1) CN103607705B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104185158A (en) * 2014-09-01 2014-12-03 北京奇虎科技有限公司 Malicious short message processing method and client based on false base station
CN107229638A (en) * 2016-03-24 2017-10-03 北京搜狗科技发展有限公司 A kind of text message processing method and device
CN107231334A (en) * 2016-03-24 2017-10-03 中国移动通信集团山东有限公司 A kind of short message monitoring method and device
CN106714122B (en) * 2016-05-03 2020-04-28 腾讯科技(深圳)有限公司 Short message transmission virus detection method and device
CN107809410B (en) * 2016-09-09 2019-03-08 腾讯科技(深圳)有限公司 Information filtering method and device
CN106792579A (en) * 2016-12-01 2017-05-31 北京奇虎科技有限公司 A kind of multimedia message hold-up interception method and device
CN106658437A (en) * 2016-12-01 2017-05-10 北京奇虎科技有限公司 Information interception method and device
CN107203580B (en) * 2017-02-27 2018-06-26 广州旺加旺网络科技有限公司 Webpage display method and mobile terminal using same
CN107239504A (en) * 2017-05-10 2017-10-10 上海交通大学 A kind of deep learning algorithm for being used to recognize fraud text message
CN109802915B (en) * 2017-11-16 2021-06-11 中国移动通信集团河南有限公司 Telecommunication fraud detection processing method and device
CN109819125A (en) * 2017-11-20 2019-05-28 中兴通讯股份有限公司 A kind of method and device limiting telecommunication fraud
CN108564238A (en) * 2017-12-28 2018-09-21 百度在线网络技术(北京)有限公司 Data assessment method and apparatus, server, storage medium
CN112671982B (en) * 2020-12-15 2021-09-14 中国信息通信研究院 Crank call identification method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101895868A (en) * 2010-04-29 2010-11-24 上海华勤通讯技术有限公司 Method for filtering fallacious message based on mobile phone
CN102098638A (en) * 2010-12-15 2011-06-15 成都市华为赛门铁克科技有限公司 Short message sorting method and device, and terminal
CN103037339A (en) * 2012-12-28 2013-04-10 深圳市彩讯科技有限公司 Short message filtering method based on user creditworthiness and short message spam degree

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101895868A (en) * 2010-04-29 2010-11-24 上海华勤通讯技术有限公司 Method for filtering fallacious message based on mobile phone
CN102098638A (en) * 2010-12-15 2011-06-15 成都市华为赛门铁克科技有限公司 Short message sorting method and device, and terminal
CN103037339A (en) * 2012-12-28 2013-04-10 深圳市彩讯科技有限公司 Short message filtering method based on user creditworthiness and short message spam degree

Also Published As

Publication number Publication date
CN103607705A (en) 2014-02-26

Similar Documents

Publication Publication Date Title
CN103607705B (en) Method for filtering spam short messages and engine
CN103957516A (en) Junk short message filtering method and engine
CN101790142B (en) Method and system for identifying spam message sources by combining message contents and transmission frequency
CN101771966B (en) Keywords and frequency based method for identifying spam message sources
CN105869035A (en) Mobile user credit evaluation method and apparatus
CN104301161B (en) Computational methods, computing device and the communication system of quality of service index
CN101335920A (en) Rubbish short message recognition system and method based on calling number location and transmitted content
CN103916256B (en) Network optimized approach and device, system
CN101860822A (en) Method and system for monitoring spam messages
CN107872494A (en) A kind of information push method and device
CN103874058A (en) Short message processing method and short message center
CN106470150A (en) Relation chain storage method and device
CN101389085B (en) Rubbish short message recognition system and method based on sending behavior
CN100589609C (en) Method of implementing handset multimedia message firewall
CN105516979A (en) Mobile network information acquisition and opening method and system
CN108901035A (en) The recognition methods of internet-of-things terminal and device
CN102984739A (en) Breakdown information processing method and processing device
CN102905236B (en) A kind of junk short message monitoring method, Apparatus and system
CN102271331A (en) Method and system for detecting reliability of service provider (SP) site
CN102438233A (en) Detection method for mobile phone worms
CN104852983A (en) Monitoring and early warning system based on water environment sensor network and method thereof
CN109685129A (en) A kind of multiclass social application subject information cluster association method based on smart phone
CN104156228A (en) Client-side short message filtration embedded feature library generating and updating method
CN102572746B (en) A kind of method sending behavioural characteristic identification junk short message source based on the frequency and user
CN104902542B (en) A kind of information sharing method and mobile communication terminal

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160921

Termination date: 20191204

CF01 Termination of patent right due to non-payment of annual fee