CN106878962A - method and device for determining junk information - Google Patents
method and device for determining junk information Download PDFInfo
- Publication number
- CN106878962A CN106878962A CN201510927718.XA CN201510927718A CN106878962A CN 106878962 A CN106878962 A CN 106878962A CN 201510927718 A CN201510927718 A CN 201510927718A CN 106878962 A CN106878962 A CN 106878962A
- Authority
- CN
- China
- Prior art keywords
- information
- junk
- content
- described information
- source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/12—Messaging; Mailboxes; Announcements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/21—Monitoring or handling of messages
- H04L51/212—Monitoring or handling of messages using filtering or selective blocking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/58—Message adaptation for wireless communication
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W12/00—Security arrangements; Authentication; Protecting privacy or anonymity
- H04W12/12—Detection or prevention of fraud
Abstract
The invention provides for a kind of method and device for determining junk information, wherein, method includes:Receive from outside information, determine the information source and content of information;Judge whether information is junk information according to information source, when it is not junk information that information source judges information, the content according to information judges whether information is junk information;To be judged as that the information of junk information is defined as junk information by the content of information source or information.Whether the technical scheme that the present invention is provided, can be junk information by the quick identification information of information source.In addition, on the basis of refuse messages identification validity is ensured, avoid in the case where agreeing to without user by the content of short message be uploaded directly into caused by server infringement individual subscriber privacy problem, and by alleviating the processing pressure of the upload and cloud server of client after local calculating treatment, recognition efficiency is improve, meets user's request.
Description
Technical field
The present invention relates to technical field of network information safety, and in particular to the side for determining junk information
Method and device.
Background technology
With the development of mobile communication technology, the popularization of mobile device and the decline of short message rate, short message
As carrying out one of important method of information transmission between mobile terminal.User is enjoying short message communication band
Come it is convenient when, but also be subjected to the harassing and wrecking of the junk information such as some advertisement SMSs, fraud text message, this
A little junk information have influence on the short message experience of user, to the person, information, the property safety of user
Bring hidden danger.Therefore, the identification and interception to refuse messages are problem demanding prompt solutions.
In the prior art, the identification to refuse messages is uploaded directly into clothes often through by the content of short message
Business device, is identified by server to the content of the short message, and recognition result is returned into mobile terminal,
If the short message is refuse messages, mobile terminal is intercepted to the short message or user is carried
Show.
The program exist problem be:
1st, the content of information often relates to the personal information of user, in situation about agreeing to without user
It is lower that the content of short message is uploaded directly into the individual privacy that server has invaded user, to the information of user
Safety causes infringement.
2nd, when mobile terminal receives short message every time, by the content uploading of short message to server, the mistake
Journey often expends suitable network traffics, and processing speed is subject to the network state residing for mobile terminal
Limitation, can reduce the performance of terminal device.
The content of the invention
In view of the above problems, the present invention proposes one kind and overcomes above mentioned problem or solve at least in part
The method and device for determining junk information of above mentioned problem.
According to an aspect of the invention, there is provided a kind of method for determining junk information, including:
Receive from outside information, determine the information source and content of described information;
Judge whether described information is junk information according to described information source, when described information source judges institute
When stating information and being not junk information, the content according to described information judges whether described information is rubbish letter
Breath;
To be judged as that the described information of junk information is true by the content in described information source or described information
It is set to junk information.
Preferably, judge whether described information is that junk information includes according to described information source:
The record in junk information source database by described information source with local record is compared, when
When described information source is junk information source, described information is defined as junk information;Or,
Described information source is sent to cloud server, and receives the instruction that the cloud server is returned
Information, when it is junk information source that the configured information determines described information source, described information is defined as
Junk information.
Preferably, the content according to described information judges whether described information is that junk information includes:
According to the selection of user, the content of described information is directly uploaded to cloud server, or,
The alternative information of the content of described information is uploaded to cloud server;
The identification information that the cloud server is returned is received, the letter is determined according to the identification information
Whether breath is junk information.
Preferably, the alternative information of the content of described information is uploaded into cloud server includes:
Calculate the corresponding cryptographic Hash of content of described information;
The corresponding cryptographic Hash of the content of described information is uploaded into cloud server.
Preferably, the corresponding cryptographic Hash of content for calculating described information includes:
Content to described information carries out word segmentation processing;
Different vector values are assigned to each word after participle, collects the content for being calculated described information
Corresponding simhash values.
Preferably, when described information is defined as junk information, on the local or cloud server
Junk information source database records the information source of described information.
Preferably, the rubbish for being recorded on the junk information source database of local record and the cloud server
Rubbish database of information sources interacts renewal.
According to another aspect of the present invention, there is provided a kind of device for determining junk information, wrap
Include:
Receiver module, for receiving from outside information, determines the information source and content of described information;
Processing module, for judging whether described information is junk information according to described information source, works as institute
When stating information source and judging that described information is not junk information, for judging institute according to the content of described information
State whether information is junk information;
Determination module, for junk information will to be judged as by the content in described information source or described information
Described information be defined as junk information.
Preferably, the processing module is used to judge whether described information is rubbish according to described information source
Information includes:
The processing module is used in the junk information source database by described information source with local record
Record compare, when described information source is junk information source, the determination module is by the letter
Breath is defined as junk information;Or,
The processing module is used to for described information source to be sent to cloud server, and the receiver module is used
In the configured information that the cloud server is returned is received, when the configured information determines described information source
During for junk information source, described information is defined as junk information by the determination module.
Preferably, the processing module be used for according to the content of described information judge described information whether be
Junk information includes:
According to the selection of user, the processing module is used to for the content of described information to be directly uploaded to cloud
End server, or, the processing module is used to be uploaded to the alternative information of the content of described information
Cloud server;
The receiver module is used to receive the identification information that the cloud server is returned, the judgement mould
Block is used to determine whether described information is junk information according to the identification information.
Preferably, the processing module is used to for the alternative information of the content of described information to be uploaded to high in the clouds
Server includes:
The processing module is used to calculate the corresponding cryptographic Hash of content of described information;
The processing module is used to for the corresponding cryptographic Hash of the content of described information to upload to cloud service
Device.
Preferably, the processing module is used to calculate the corresponding cryptographic Hash bag of content of described information
Include:
The processing module is used to carry out word segmentation processing to the content of described information;
The processing module is used to assign different vector values to each word after participle, collects and calculates
To the corresponding simhash values of content of described information.
Preferably, when it is junk information that the determination module determines described information, local or described cloud
Junk information source database on the server of end records the information source of described information.
Preferably, the junk information source data that the receiver module and the processing module will be locally recorded
Storehouse interacts renewal with the junk information source database of record on the cloud server.
Whether the such scheme that the present invention is provided, can be that rubbish is believed by the quick identification information of information source
Breath.Additionally, on the basis of refuse messages identification validity is ensured, it is to avoid agree to without user
In the case of the content of short message is uploaded directly into infringement individual subscriber privacy caused by server
Problem, and by alleviating the place of the upload and cloud server of client after local calculating treatment
Reason pressure, improves recognition efficiency, meets user's request.
The additional aspect of the present invention and advantage will be set forth in part in the description, and these will be from following
Description in become obvious, or by it is of the invention practice recognize.
Brief description of the drawings
The above-mentioned and/or additional aspect of the present invention and advantage to embodiment from retouching below in conjunction with the accompanying drawings
Be will be apparent in stating and be readily appreciated that, wherein:
Fig. 1 shows a kind of method for determining junk information according to an embodiment of the invention
Flow chart;
Fig. 2 shows in accordance with another embodiment of the present invention for determining the method for junk information
Flow chart;
Fig. 3 shows a kind of device for determining junk information according to an embodiment of the invention
Schematic diagram;
Fig. 4 shows a kind of high in the clouds for determining junk information according to an embodiment of the invention
The schematic diagram of server.
Specific embodiment
Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, its
In from start to finish same or similar label represent same or similar element or with same or like
The element of function.Embodiment below with reference to Description of Drawings is exemplary, is only used for explaining this
Invention, and be not construed as limiting the claims.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singulative used herein
" one ", " one ", " described " and " being somebody's turn to do " may also comprise plural form.It is to be further understood that
Used in specification of the invention wording " including " refer to the presence of the feature, integer, step, behaviour
Make, element and/or component, but it is not excluded that in the presence of or add one or more other features, whole
Number, step, operation, element, component and/or their group.It should be understood that when we claim element
It is " connected " or during " coupled " to another element, it can be directly connected or coupled to other elements, or
Can also there is intermediary element in person.Additionally, " connection " used herein or " coupling " can be included wirelessly
Connection or wireless coupling.Wording "and/or" used herein includes one or more associated listing
The whole or any cell of item and all combination.
Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein
(including technical term and scientific terminology), with art of the present invention in those of ordinary skill
General understanding identical meaning.It should also be understood that those arts defined in such as general dictionary
Language, it should be understood that with the meaning that the meaning in the context with prior art is consistent, and remove
It is non-as here by specific definitions, will not otherwise be explained with idealization or excessively formal implication.
Fig. 1 shows a kind of method for determining junk information according to an embodiment of the invention
Flow chart.As shown in figure 1, the method includes:
Step S110, receives from outside information, determines the information source and content of information;
Step S120, judges whether information is junk information according to information source, when information source judges to believe
When breath is not junk information, the content according to information judges whether information is junk information;
Step S130, will be judged as that the information of junk information determines by the content of information source or information
It is junk information.
In the present invention, to include but is not limited to short message, instant communication information etc. conventional or unconventional for information
Information.Information source includes but is not limited to cell-phone number, Information ID etc. can be originated with beacon information
Identifier.Without loss of generality and for convenience of description, hereinafter few examples are substituted with short message
Information, cell-phone number alternative information source illustrate.It should be appreciated that this is only used for explaining the present invention, and
It is not construed as limiting the claims.
Method shown in Fig. 1 describes to judge that information is according to information source first from the angle of client
No is junk information, when information source cannot be defined as junk information, then is entered in itself by the information content
Row judges.
Specifically, in step step S120, judge whether information is rubbish letter according to information source
Breath includes:
The record in junk information source database by information source with local record is compared, and works as information
When source is junk information source, information is defined as junk information;Or,
Information source is sent to cloud server, and receives the configured information that cloud server is returned, when
When configured information determines information source for junk information source, information is defined as junk information.
The above method is on the basis of junk information identification validity is ensured, it is to avoid same without user
The content that information is believed is uploaded directly into caused by server infringement individual subscriber in the case of meaning is hidden
Private problem.
Furthermore, when information source cannot be defined as junk information, entered in itself by the information content
Row judges.Specifically, the content according to information judges whether information is that junk information includes:
According to the selection of user, the content of information is directly uploaded to cloud server, or, will believe
The alternative information of the content of breath is uploaded to cloud server;
The identification information that cloud server is returned is received, determines whether information is rubbish according to identification information
Information.
For example, when user starts the client for carrying out refuse messages identification for the first time in mobile terminal
When, ejection statement agreement asks the user whether to agree to directly by short message content in the statement agreement
Pass to cloud server;If user have selected agreement, when mobile terminal receives short message, directly
Connect and the content uploading of the short message is identified to cloud server.If user have selected disagreed,
Then when mobile terminal receives short message, perform previously described by the alternative information of the content of information
The step of reaching cloud server.The present embodiment is fundamentally solved existing from the wish of user
Have and individual subscriber privacy, the problem of harm user information safety are invaded present in technology.
Specifically, the alternative information of the content of information is uploaded into cloud server includes:
Calculate the corresponding cryptographic Hash of content of information;
The corresponding cryptographic Hash of the content of information is uploaded into cloud server.
Furthermore, the corresponding cryptographic Hash of content for calculating information includes:
Content to information carries out word segmentation processing;
Different vector values are assigned to each word after participle, collects the content correspondence of the information of being calculated
Simhash values.
For example, by taking short message as an example, with the corresponding cryptographic Hash of the content of short message as identification object, client
The process interacted with cloud server, realizes the identification to refuse messages.The method is ensureing
On the basis of refuse messages identification validity, it is to avoid in the case where agreeing to without user by short message
Content is uploaded directly into the problem of the infringement individual subscriber privacy caused by server;And by local
The place for interacting burden and cloud server of client and cloud server is alleviated after calculating treatment
Reason pressure, improves recognition efficiency, meets user's request.
In one embodiment of the invention, the method shown in Fig. 1 is further included:
When it is determined that the short message is refuse messages, intercept process is carried out to the short message.Blocking here
Cutting treatment can specifically include:Directly delete refuse messages;Or the short breath of rubbish is transferred to what is specified
In file, the treatment of user is waited;Or the prompting of refuse messages is carried out to user.
In one embodiment of the invention, the described corresponding cryptographic Hash of the content for calculating the short message
Refer to:It is the numerical value of regular length by the content map of the short message according to certain traditional hash algorithm,
The numerical value is referred to as cryptographic Hash, and the cryptographic Hash is the unique of the content of the short message and extremely compact numerical tabular
Show form.
Hash algorithm described in the present embodiment includes:HAVAL, MD2, MD4, MD5 or SHA1
Deng this quasi-tradition hash algorithm is all just like next essential characteristic:Seldom occurs hash in input domain
Conflict, the i.e. text for possible gap only one of which byte can also map out two entirely different Kazakhstan
Uncommon value.
For example, the content of two fraud text messages is respectively:It is " congratulations have suffered 50,000 yuan of Grand Prixs " and " respectful
Like you and suffered 10,000 yuan of Grand Prixs ", the content correspondence for calculating this two short messages by traditional hash algorithm is breathed out
Uncommon value is respectively 286 and 523.It can be seen that, two closely similar short messages of fraud tactics are calculated
The cryptographic Hash for going out is entirely different, and the identification pressure of cloud server in subsequent treatment may be caused excessive.
Therefore, for the deviation between the content correspondence cryptographic Hash of the short message for removing small range difference, subtract
The identification pressure of cloud server in light subsequent treatment.
In another embodiment of the present invention, calculate the short message content correspondence cryptographic Hash refer to:
Calculate the corresponding simhash values of content of the short message.Detailed process is as follows:
Content to short message carries out word segmentation processing.
Different vector values are assigned to each word after participle, collects the content correspondence for being calculated short message
Simhash values.
The solution of the present invention is described in detail by taking Tables 1 and 2 as an example below.
1 one processes of the corresponding simhash values of content of calculating short message of table
Table 1 shows corresponding according to the content of one specific embodiment calculating short message of the present invention
The process of simhash values.As shown in table 1, in the present embodiment, the short message that mobile terminal is received
It is:" our company opens common invoice generation, and generation does not open VAT invoice and professional invoice for our company.”
First, the corresponding vector form of initialization simhash values:A=Ao={ 0,0,0,0,0,0 }.
Then, the content to the short message carries out word segmentation processing:Our company/generation opens/and common/invoice/,
The generation of our company/or not opens/value-added tax/special/invoice/and/specialty/invoice;Obtain each after participle
Individual word is:Our company, Dai Kai, or not value-added tax is special, invoice, commonly, and, specialty.
According to certain traditional hash algorithm, corresponding 6 cryptographic Hash of each word are calculated respectively:
Our company:100110, Dai Kai:110000, no:101111, value-added tax:110001, it is special:
010110, invoice:101011, commonly:110100, and:110110, specialty:001001.
The word frequency of each word is calculated again as corresponding vector value, represents each word in short message content
Weight:Our company:2, Dai Kai:2, no:1, value-added tax:1, it is special:1, invoice:3, it is general
It is logical:1, and:1, specialty:1.
Constitute a vector B:In our company/100110/2, generation, opens/110000/2, not/101111/1,
Value-added tax/110001/1, special/010110/1, invoice/101011/3, common/110100/1, and
/ 110110/1, specialty/001001/1 }.
Each word in vectorial B is processed successively, processing mode is as follows:For each word, such as
It is really " 1 " that then the i-th dimension to vectorial A adds the word frequency of the word in the i-th bit of its cryptographic Hash;
If being in the i-th bit of its cryptographic Hash " 0 ", the i-th dimension to vectorial A subtracts the word frequency of the word.
For example, for our company/100110/2, vectorial A is changed into { 2, -2, -2,2,2, -2 };Opened for generation
/ 110000/2, vectorial A are changed into { 2,2, -2, -2, -2, -2 };By that analogy, each word is obtained corresponding
Vectorial A, as shown in table 1.
The corresponding vectorial A of each word is carried out collecting summation, vectorial Atotal=is obtained
{ 9, -1, -3,1,5,1 }, if the vectorial i-th dimension is negative, make simhash values correspondence to
The i-th dimension of amount is " 1 ";If the vectorial i-th dimension is negative, simhash values correspondence is made
The i-th dimension of vector is " 0 ";Final simhash values correspondence vector Afinal=is obtained accordingly
{1,0,0,1,1,1}。
Therefore, short message " our company opens common invoice in generation, our company not generation open VAT invoice and
Professional invoice." simhash values be 100111.
Table 2 another calculate short message the corresponding simhash values of content process
Table 2 shows corresponding according to the content of another specific embodiment calculating short message of the invention
The process of simhash values.As shown in table 2, in the present embodiment, the short message that mobile terminal is received
It is:" our company opens common invoice generation, and generation does not open special invoice and professional invoice for our company." its
The calculating process of simhash values similarly in table 1, will not be repeated here.It can be seen that, show in table 2
Vectorial Atotal={ 8, -2, -2,0,6,0 } is obtained after having gone out to collect summation, final simhash values are obtained
Correspondence vector Afinal={ 1,0,0,1,1,1 }.Therefore, " our company opens common invoice in generation to short message, originally
In generation, does not open special invoice and professional invoice for company." simhash values be 100111, with short message sheet
Company's Dai Kai common invoices, in generation, does not open VAT invoice and professional invoice for our company."
Simhash values are identical.
From the foregoing, in the calculating process of simhash values, on the basis of each word weight is kept,
Gradually ignore the specific size of the cryptographic Hash of each word, but with the positive and negative next of value after being collected summation
Distinguish each word, and often similar short message content to obtain symbol with its similar text structure identical
Summation vector Atotal, therefore, similar short message can have identical simhash values, overcome
The hash problem of traditional hash algorithm.
Additionally, in other embodiments of the invention, can by other means to each after participle
Word assigns vector value.
As embodiments of the invention, when certain information is defined as junk information, local or high in the clouds clothes
Junk information source database on business device records the information source of the information.It is, the rubbish in the present invention
Rubbish database of information sources will continuously record information source that is new, being classified as junk information.
Furthermore, the rubbish for being recorded on the junk information source database of local record and cloud server
Rubbish database of information sources interacts renewal.
Therefore, by taking terminal device mobile phone as an example, when frequent externally sending rubbish short message, the hand of the mobile phone
Machine number will be included into junk information source database quickly.By the propagation of internet, and client
The local junk information source database in end is obtained after real-time update, when mobile phone continuation transmission rubbish is short
During letter, the very first time is blocked by the client in junk information source by other identified its, is shielded.
Even if client receives the junk information of mobile phone transmission, can also be known according to information source the very first time
It is not out junk information, and without the content in analysis information.
It should be appreciated that present invention method disclosed above, although be described with the angle of client,
But its partial function can also be performed in server end, the technical scheme of this part is it will be also be appreciated that originally
The category of disclosure of the invention.
Fig. 2 shows a kind of method for determining junk information in accordance with another embodiment of the present invention
Flow chart.As shown in Fig. 2 the method includes:
Step S210, receives the number of the transmission short message that client-side is uploaded or the content correspondence of short message
Cryptographic Hash.
Step S220, will send the corresponding cryptographic Hash of the content of number or the short message and Hash of short message
Value storehouse is matched.
In this step, the junk information source database that cloud server is recognized according to historical record, sentences
Whether whether the number of the short message of disconnected upper hair has been recorded, i.e., be present in junk information source database
In.
Or, in this step, correspondence preserves the corresponding Hash of different short message contents in cryptographic Hash storehouse
Value and black or white identification information has been judged as it.Wherein, black identification information be indicate the information be
Junk information;White identification information is to indicate the information to be not junk information.
In one embodiment of the invention, the cryptographic Hash storehouse is that cloud server recognizes note according to history
Set by record, after cloud server carries out the identification of refuse messages every time, which kind of identification no matter chosen
The features such as method, the content of the short message that will be recognized, keyword or cryptographic Hash are corresponding with identification information
Record, take the corresponding record of cryptographic Hash therein and identification information, set up cryptographic Hash storehouse.
Step S230, client is returned to by identification information.
It can be seen that, the method shown in Fig. 2 describes cloud server and receives the transmission that client is sent
After the number of short message or the content correspondence cryptographic Hash of short message, identification information is returned to the mistake of client
Journey.The method is on the basis of refuse messages identification validity is ensured, it is to avoid agree to without user
In the case of the content of short message is uploaded directly into infringement individual subscriber privacy caused by server
Problem;And by alleviating the processing pressure and high in the clouds clothes of cloud server after local calculating treatment
Business device interacts burden with client, improves recognition efficiency, meets user's request.
In one embodiment of the invention, by taking terminal device mobile phone as an example, when often to outgoing rubbish
During short message, the phone number of the mobile phone will be included into junk information source data by cloud server quickly
Storehouse.When client receives the junk information of mobile phone transmission, the cloud server meeting very first time is according to letter
Breath source and to be identified be junk information, and the content in short message need not be analyzed.
In one embodiment of the invention, the content of the short message received by cloud server is corresponding
Cryptographic Hash is the corresponding simhash values of content of the short message, correspondingly, the Kazakhstan of cloud server
Uncommon value storehouse is specially simhash values storehouse.
In one embodiment of the invention, the method shown in Fig. 2 is further included:
Step S240 (not shown)s, receive the short message content of user's report.
Step S250 (not shown)s, are carried out black or white to each short message content of user's report
Identification, and corresponding simhash values are generated, simhash values and corresponding identification information are saved in
In cryptographic Hash storehouse.
In this step, calculate the content correspondence simhash values of short message by cloud server, its process and
The process that the above client calculates simhash is similar to, and will not be repeated here.
Fig. 3 shows a kind of device for determining junk information according to an embodiment of the invention
Schematic diagram.As shown in figure 3, determine the device 300 of junk information including:
Receiver module 310, for receiving from outside information, determines the information source and content of information;
Processing module 320, for judging whether information is junk information according to information source, works as information source
When judgement information is not junk information, for judging whether information is rubbish letter according to the content of information
Breath;
Determination module 330, for the letter by junk information is judged as by the content of information source or information
Breath is defined as junk information.
Used as the embodiment of the device 300 for determining junk information, processing module 320 is used for according to information
Source judges whether information is that junk information includes:
The note that processing module 320 is used in the junk information source database by information source with local record
Record is compared, and when information source is junk information source, information is defined as rubbish by determination module 330
Information;Or,
Processing module 320 is used to for information source to be sent to cloud server, and receiver module 310 is used to connect
The configured information that cloud server is returned is received, when it is junk information source that configured information determines information source,
Information is defined as junk information by determination module 330.
Furthermore, processing module 320 is used to judge whether information is rubbish according to the content of information
Information includes:
According to the selection of user, processing module 320 is used to for the content of information to be directly uploaded to high in the clouds clothes
Business device, or, processing module 320 is used to for the alternative information of the content of information to be uploaded to cloud service
Device;
Receiver module 310 is used to receive the identification information of cloud server return, and determination module 330 is used
In determining whether information is junk information according to identification information.
Used as the embodiment of the device 300 for determining junk information, processing module 320 is used for information
The alternative information of content is uploaded to cloud server to be included:
Processing module 320 is used to calculate the corresponding cryptographic Hash of content of information;
Processing module 320 is used to for the corresponding cryptographic Hash of the content of information to upload to cloud server.
Furthermore, processing module 320 is used to calculate the corresponding cryptographic Hash bag of content of information
Include:
Processing module 320 is used to carry out word segmentation processing to the content of information;
Processing module 320 is used to assign different vector values to each word after participle, collects and calculates
To the corresponding simhash values of content of information.
Be sent to for the content correspondence cryptographic Hash of the information source of the information of transmission or information by processing module 320
Identification information is returned to receiver module 310 by cloud server, cloud server after judging.Therefore,
Determine the device 300 of junk information on the basis of refuse messages identification validity is ensured, it is to avoid
The content of short message is uploaded directly into infringement caused by server in the case of agreeing to without user to use
The problem of family individual privacy;And by alleviating the treatment pressure of cloud server after local calculating treatment
Power and cloud server interact burden with client, improve recognition efficiency, meet user's request.
In one embodiment of the invention, processing module 320 is suitable to be calculated according to certain traditional Hash
Method, calculates the corresponding cryptographic Hash of content of short message.Hash algorithm described in the present embodiment includes:
HAVAL, MD2, MD4, MD5 or SHA1 etc., from the foregoing, it can be understood that this quasi-tradition Hash
Algorithm is all just like next essential characteristic:Seldom there is hash collision in input domain, i.e., for possible
The text of gap only one of which byte can also map out two entirely different cryptographic Hash.
Therefore, for the deviation between the content correspondence cryptographic Hash of the short message for removing small range difference, subtract
The identification pressure of cloud server in light subsequent treatment, in another embodiment of the present invention, treatment
Module 320 is suitable to carry out word segmentation processing to the content of short message;Each word after participle is assigned different
Vector value, collects the corresponding simhash values of content for being calculated the short message.Wherein, processing module
One specific embodiment of the 320 corresponding simhash values of content for calculating short message is as shown in table 1, on
Described in detail in text, will not be repeated here.
Additionally, when it is junk information that determination module 330 determines information, on local or cloud server
Junk information source database record information information source.
Furthermore, the junk information source that receiver module 310 and processing module 320 will be locally recorded
Database interacts renewal with the junk information source database of record on cloud server.
Therefore, by taking terminal device mobile phone as an example, when frequent externally sending rubbish short message, by internet
Propagation, the device 300 that the phone number of the mobile phone will quickly be determined junk information included into rubbish
Database of information sources.When the mobile phone continues to send refuse messages, the very first time is known by other
It is not blocked by the client in junk information source, shielded.Even if determining the device 300 of junk information
The junk information of mobile phone transmission is received, it is rubbish that can be also identified according to information source the very first time
Rubbish information, and without the content in analysis information.
Fig. 4 shows a kind of cloud service for determining junk information according to an embodiment of the invention
The schematic diagram of device.
As shown in figure 4, determine the cloud server 400 of junk information including:
Receiving unit 410, in the number or short message that receive the transmission short message of client-side upload
Hold corresponding cryptographic Hash.
Recognition unit 420, for the corresponding cryptographic Hash of the content of number or the short message by short message is sent
Matched with cryptographic Hash storehouse.
In this unit, the junk information source database that cloud server is recognized according to historical record is sentenced
Whether whether the number of the short message of disconnected upper hair has been recorded, i.e., be present in junk information source database
In.
Or, in this unit, correspondence preserves the corresponding Hash of different short message contents in cryptographic Hash storehouse
Value and black or white identification information has been judged as it.In one embodiment of the invention, the cryptographic Hash
Storehouse be cloud server 400 according to set by history identification record, cloud server 400 is each
After carrying out the identification of refuse messages, which kind of recognition methods no matter is chosen, in the short message that will be recognized
Hold, the feature such as keyword or cryptographic Hash is got off with identification information corresponding record, take cryptographic Hash therein and
The corresponding record of identification information, sets up cryptographic Hash storehouse.
Feedback unit 430, for identification information to be returned into client.
It can be seen that, the scheme shown in Fig. 4 illustrates that receiving unit 410 receives the hair that client is sent
After sending the number of short message or the content correspondence cryptographic Hash of short message, feedback unit 430 returns to identification information
To the process of client.The program is on the basis of refuse messages identification validity is ensured, it is to avoid
The content of short message is uploaded directly into infringement caused by server in the case of agreeing to without user to use
The problem of family individual privacy;And by alleviating the treatment pressure of cloud server after local calculating treatment
Power and cloud server interact burden with client, improve recognition efficiency, meet user's request.
In one embodiment of the invention, by taking terminal device mobile phone as an example, when often to outgoing rubbish
During short message, the phone number of the mobile phone will quickly be identified unit 420 and include into junk information source data
Storehouse.When client receive the mobile phone transmission junk information, recognition unit 420 can the very first time according to
Information source and to be identified be junk information, and the content in short message need not be analyzed.
In one embodiment of the invention, the content pair of the short message received by receiving unit 410
The cryptographic Hash answered is the corresponding simhash values of content of the short message, correspondingly, cloud server
Cryptographic Hash storehouse be specially simhash values storehouse.
In one embodiment of the invention, receiving unit 410, are further adapted for receiving user's report
Short message content;Recognition unit 420, is further adapted for carrying out each short message content of user's report black
Or white identification, and corresponding simhash values are generated, by simhash values and corresponding identification information
It is saved in the cryptographic Hash storehouse.Wherein, recognition unit 420 calculates the content correspondence simhash of short message
Value, its process is similar with the process that the above client calculates simhash, no longer goes to live in the household of one's in-laws on getting married herein
State.
Those skilled in the art of the present technique are appreciated that the present invention includes being related to for performing institute in the application
The equipment for stating one or more in operation.These equipment can be for needed for purpose and specially design and
Manufacture, or the known device in all-purpose computer can also be included.These equipment have storage at it
Interior computer program, these computer programs are optionally activated or reconstructed.Such computer journey
Sequence can be stored in equipment (for example, computer) computer-readable recording medium or store and be suitable to storage electricity
Sub-instructions are simultaneously coupled in any kind of medium of bus respectively, and the computer-readable medium includes
But be not limited to any kind of disk (including floppy disk, hard disk, CD, CD-ROM and magneto-optic disk),
ROM (Read-Only Memory, read-only storage), RAM (Random Access Memory,
Memory immediately), (Erasable Programmable Read-Only Memory, can for EPROM
Erasable programmable read only memory), EEPROM (Electrically Erasable Programmable
Read-Only Memory, EEPROM), flash memory, magnetic card or light
Card.It is, computer-readable recording medium includes being deposited in the form of it can read by equipment (for example, computer)
Storage or any medium of transmission information.
Those skilled in the art of the present technique are appreciated that can realize that these are tied with computer program instructions
Each frame and these structure charts and/or block diagram and/or flow graph in composition and/or block diagram and/or flow graph
In frame combination.Those skilled in the art of the present technique are appreciated that can be referred to these computer programs
Order is supplied to the processor of all-purpose computer, special purpose computer or other programmable data processing methods
Realize, so as to perform the present invention by the processor of computer or other programmable data processing methods
The scheme specified in the frame or multiple frames of disclosed structure chart and/or block diagram and/or flow graph.
Those skilled in the art of the present technique are appreciated that various operations, the side discussed in the present invention
Step, measure, scheme in method, flow can be replaced, changed, combined or deleted.Further
Ground, with other steps in various operations, method, the flow discussed in the present invention, arranges
Apply, scheme can also be replaced, changed, reset, decompose, combines or be deleted.Further, it is existing
Have in technology with various operations, method, the flow disclosed in the present invention in step, measure,
Scheme can also be replaced, changed, reset, decomposed, combined or be deleted.
The above is only some embodiments of the invention, it is noted that for the art
For those of ordinary skill, under the premise without departing from the principles of the invention, some improvement can also be made
And retouching, these improvements and modifications also should be regarded as protection scope of the present invention.
Claims (10)
1. a kind of method for determining junk information, it is characterised in that including:
Receive from outside information, determine the information source and content of described information;
Judge whether described information is junk information according to described information source, when described information source judges institute
When stating information and being not junk information, the content according to described information judges whether described information is rubbish letter
Breath;
To be judged as that the described information of junk information is true by the content in described information source or described information
It is set to junk information.
2. method according to claim 1, it is characterised in that judged according to described information source
Whether described information is that junk information includes:
The record in junk information source database by described information source with local record is compared, when
When described information source is junk information source, described information is defined as junk information;Or,
Described information source is sent to cloud server, and receives the instruction that the cloud server is returned
Information, when it is junk information source that the configured information determines described information source, described information is defined as
Junk information.
3. according to one of any described method of claim 1 or 2, it is characterised in that according to institute
The content for stating information judges whether described information is that junk information includes:
According to the selection of user, the content of described information is directly uploaded to cloud server, or,
The alternative information of the content of described information is uploaded to cloud server;
The identification information that the cloud server is returned is received, the letter is determined according to the identification information
Whether breath is junk information.
4. method according to claim 3, it is characterised in that by the content of described information
Alternative information is uploaded to cloud server to be included:
Calculate the corresponding cryptographic Hash of content of described information;
The corresponding cryptographic Hash of the content of described information is uploaded into cloud server.
5. method according to claim 4, it is characterised in that calculate the interior of described information
Holding corresponding cryptographic Hash includes:
Content to described information carries out word segmentation processing;
Different vector values are assigned to each word after participle, collects the content for being calculated described information
Corresponding simhash values.
6. method according to claim 3, it is characterised in that when described information is defined as rubbish
During rubbish information, junk information source database record described information on the local or cloud server
Information source.
7. method according to claim 6, it is characterised in that the junk information of local record
Source database interacts renewal with the junk information source database of record on the cloud server.
8. a kind of device for determining junk information, it is characterised in that including:
Receiver module, for receiving from outside information, determines the information source and content of described information;
Processing module, for judging whether described information is junk information according to described information source, works as institute
When stating information source and judging that described information is not junk information, for judging institute according to the content of described information
State whether information is junk information;
Determination module, for junk information will to be judged as by the content in described information source or described information
Described information be defined as junk information.
9. device according to claim 8, it is characterised in that the processing module is used for root
Judge whether described information is that junk information includes according to described information source:
The processing module is used in the junk information source database by described information source with local record
Record compare, when described information source is junk information source, the determination module is by the letter
Breath is defined as junk information;Or,
The processing module is used to for described information source to be sent to cloud server, and the receiver module is used
In the configured information that the cloud server is returned is received, when the configured information determines described information source
During for junk information source, described information is defined as junk information by the determination module.
10. device according to claim 8 or claim 9, it is characterised in that the processing module is used
In judging whether described information is that junk information includes according to the content of described information:
According to the selection of user, the processing module is used to for the content of described information to be directly uploaded to cloud
End server, or, the processing module is used to be uploaded to the alternative information of the content of described information
Cloud server;
The receiver module is used to receive the identification information that the cloud server is returned, the judgement mould
Block is used to determine whether described information is junk information according to the identification information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510927718.XA CN106878962A (en) | 2015-12-14 | 2015-12-14 | method and device for determining junk information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510927718.XA CN106878962A (en) | 2015-12-14 | 2015-12-14 | method and device for determining junk information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106878962A true CN106878962A (en) | 2017-06-20 |
Family
ID=59178614
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510927718.XA Pending CN106878962A (en) | 2015-12-14 | 2015-12-14 | method and device for determining junk information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106878962A (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050144279A1 (en) * | 2003-12-31 | 2005-06-30 | Wexelblat David E. | Transactional white-listing for electronic communications |
CN104254074A (en) * | 2013-06-28 | 2014-12-31 | 腾讯科技(深圳)有限公司 | Method and device for intercepting spam short messages |
-
2015
- 2015-12-14 CN CN201510927718.XA patent/CN106878962A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050144279A1 (en) * | 2003-12-31 | 2005-06-30 | Wexelblat David E. | Transactional white-listing for electronic communications |
CN104254074A (en) * | 2013-06-28 | 2014-12-31 | 腾讯科技(深圳)有限公司 | Method and device for intercepting spam short messages |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
RU2708508C1 (en) | Method and a computing device for detecting suspicious users in messaging systems | |
US8549642B2 (en) | Method and system for using spam e-mail honeypots to identify potential malware containing e-mails | |
CN104640092B (en) | Identify the method for refuse messages, client, cloud server and system | |
CN104378283B (en) | A kind of sensitive mail filtering system and method based on customer end/server mode | |
US9614866B2 (en) | System, method and computer program product for sending information extracted from a potentially unwanted data sample to generate a signature | |
US8886664B2 (en) | Decreasing duplicates and loops in an activity record | |
CN104184653B (en) | A kind of method and apparatus of message screening | |
CN104794170A (en) | Network evidence taking content tracing method based on multiple fingerprint Hash bloom filters | |
WO2016082568A1 (en) | Short message safe processing method and apparatus | |
CN101141416A (en) | Real-time rubbish mail filtering method and system used for transmission influx stage | |
CN111752973A (en) | System and method for generating heuristic rules for identifying spam e-mails | |
US20220109621A1 (en) | IP-Based Matching System | |
WO2016177148A1 (en) | Short message interception method and device | |
KR20180089479A (en) | User data sharing method and device | |
CN114169438A (en) | Telecommunication network fraud identification method, device, equipment and storage medium | |
CN107172622A (en) | The identification of pseudo-base station note and analysis method, apparatus and system | |
CN117252429A (en) | Risk user identification method and device, storage medium and electronic equipment | |
WO2016037489A1 (en) | Method, device and system for monitoring rcs spam messages | |
CN106878962A (en) | method and device for determining junk information | |
CN106878994A (en) | method and device for determining junk information | |
Belém et al. | Content filtering for SMS systems based on Bayesian classifier and word grouping | |
Altuncu et al. | Deep learning based DNS tunneling detection and blocking system | |
CN110808978B (en) | Real name authentication method and device | |
CN106911660B (en) | Information management method and device | |
CN106982304A (en) | A kind of score information matching process and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170620 |