CN106878962A - method and device for determining junk information - Google Patents

method and device for determining junk information Download PDF

Info

Publication number
CN106878962A
CN106878962A CN201510927718.XA CN201510927718A CN106878962A CN 106878962 A CN106878962 A CN 106878962A CN 201510927718 A CN201510927718 A CN 201510927718A CN 106878962 A CN106878962 A CN 106878962A
Authority
CN
China
Prior art keywords
information
junk
content
described information
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510927718.XA
Other languages
Chinese (zh)
Inventor
范国峰
常富洋
李振博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201510927718.XA priority Critical patent/CN106878962A/en
Publication of CN106878962A publication Critical patent/CN106878962A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/12Messaging; Mailboxes; Announcements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/212Monitoring or handling of messages using filtering or selective blocking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/58Message adaptation for wireless communication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/12Detection or prevention of fraud

Abstract

The invention provides for a kind of method and device for determining junk information, wherein, method includes:Receive from outside information, determine the information source and content of information;Judge whether information is junk information according to information source, when it is not junk information that information source judges information, the content according to information judges whether information is junk information;To be judged as that the information of junk information is defined as junk information by the content of information source or information.Whether the technical scheme that the present invention is provided, can be junk information by the quick identification information of information source.In addition, on the basis of refuse messages identification validity is ensured, avoid in the case where agreeing to without user by the content of short message be uploaded directly into caused by server infringement individual subscriber privacy problem, and by alleviating the processing pressure of the upload and cloud server of client after local calculating treatment, recognition efficiency is improve, meets user's request.

Description

Method and device for determining junk information
Technical field
The present invention relates to technical field of network information safety, and in particular to the side for determining junk information Method and device.
Background technology
With the development of mobile communication technology, the popularization of mobile device and the decline of short message rate, short message As carrying out one of important method of information transmission between mobile terminal.User is enjoying short message communication band Come it is convenient when, but also be subjected to the harassing and wrecking of the junk information such as some advertisement SMSs, fraud text message, this A little junk information have influence on the short message experience of user, to the person, information, the property safety of user Bring hidden danger.Therefore, the identification and interception to refuse messages are problem demanding prompt solutions.
In the prior art, the identification to refuse messages is uploaded directly into clothes often through by the content of short message Business device, is identified by server to the content of the short message, and recognition result is returned into mobile terminal, If the short message is refuse messages, mobile terminal is intercepted to the short message or user is carried Show.
The program exist problem be:
1st, the content of information often relates to the personal information of user, in situation about agreeing to without user It is lower that the content of short message is uploaded directly into the individual privacy that server has invaded user, to the information of user Safety causes infringement.
2nd, when mobile terminal receives short message every time, by the content uploading of short message to server, the mistake Journey often expends suitable network traffics, and processing speed is subject to the network state residing for mobile terminal Limitation, can reduce the performance of terminal device.
The content of the invention
In view of the above problems, the present invention proposes one kind and overcomes above mentioned problem or solve at least in part The method and device for determining junk information of above mentioned problem.
According to an aspect of the invention, there is provided a kind of method for determining junk information, including:
Receive from outside information, determine the information source and content of described information;
Judge whether described information is junk information according to described information source, when described information source judges institute When stating information and being not junk information, the content according to described information judges whether described information is rubbish letter Breath;
To be judged as that the described information of junk information is true by the content in described information source or described information It is set to junk information.
Preferably, judge whether described information is that junk information includes according to described information source:
The record in junk information source database by described information source with local record is compared, when When described information source is junk information source, described information is defined as junk information;Or,
Described information source is sent to cloud server, and receives the instruction that the cloud server is returned Information, when it is junk information source that the configured information determines described information source, described information is defined as Junk information.
Preferably, the content according to described information judges whether described information is that junk information includes:
According to the selection of user, the content of described information is directly uploaded to cloud server, or, The alternative information of the content of described information is uploaded to cloud server;
The identification information that the cloud server is returned is received, the letter is determined according to the identification information Whether breath is junk information.
Preferably, the alternative information of the content of described information is uploaded into cloud server includes:
Calculate the corresponding cryptographic Hash of content of described information;
The corresponding cryptographic Hash of the content of described information is uploaded into cloud server.
Preferably, the corresponding cryptographic Hash of content for calculating described information includes:
Content to described information carries out word segmentation processing;
Different vector values are assigned to each word after participle, collects the content for being calculated described information Corresponding simhash values.
Preferably, when described information is defined as junk information, on the local or cloud server Junk information source database records the information source of described information.
Preferably, the rubbish for being recorded on the junk information source database of local record and the cloud server Rubbish database of information sources interacts renewal.
According to another aspect of the present invention, there is provided a kind of device for determining junk information, wrap Include:
Receiver module, for receiving from outside information, determines the information source and content of described information;
Processing module, for judging whether described information is junk information according to described information source, works as institute When stating information source and judging that described information is not junk information, for judging institute according to the content of described information State whether information is junk information;
Determination module, for junk information will to be judged as by the content in described information source or described information Described information be defined as junk information.
Preferably, the processing module is used to judge whether described information is rubbish according to described information source Information includes:
The processing module is used in the junk information source database by described information source with local record Record compare, when described information source is junk information source, the determination module is by the letter Breath is defined as junk information;Or,
The processing module is used to for described information source to be sent to cloud server, and the receiver module is used In the configured information that the cloud server is returned is received, when the configured information determines described information source During for junk information source, described information is defined as junk information by the determination module.
Preferably, the processing module be used for according to the content of described information judge described information whether be Junk information includes:
According to the selection of user, the processing module is used to for the content of described information to be directly uploaded to cloud End server, or, the processing module is used to be uploaded to the alternative information of the content of described information Cloud server;
The receiver module is used to receive the identification information that the cloud server is returned, the judgement mould Block is used to determine whether described information is junk information according to the identification information.
Preferably, the processing module is used to for the alternative information of the content of described information to be uploaded to high in the clouds Server includes:
The processing module is used to calculate the corresponding cryptographic Hash of content of described information;
The processing module is used to for the corresponding cryptographic Hash of the content of described information to upload to cloud service Device.
Preferably, the processing module is used to calculate the corresponding cryptographic Hash bag of content of described information Include:
The processing module is used to carry out word segmentation processing to the content of described information;
The processing module is used to assign different vector values to each word after participle, collects and calculates To the corresponding simhash values of content of described information.
Preferably, when it is junk information that the determination module determines described information, local or described cloud Junk information source database on the server of end records the information source of described information.
Preferably, the junk information source data that the receiver module and the processing module will be locally recorded Storehouse interacts renewal with the junk information source database of record on the cloud server.
Whether the such scheme that the present invention is provided, can be that rubbish is believed by the quick identification information of information source Breath.Additionally, on the basis of refuse messages identification validity is ensured, it is to avoid agree to without user In the case of the content of short message is uploaded directly into infringement individual subscriber privacy caused by server Problem, and by alleviating the place of the upload and cloud server of client after local calculating treatment Reason pressure, improves recognition efficiency, meets user's request.
The additional aspect of the present invention and advantage will be set forth in part in the description, and these will be from following Description in become obvious, or by it is of the invention practice recognize.
Brief description of the drawings
The above-mentioned and/or additional aspect of the present invention and advantage to embodiment from retouching below in conjunction with the accompanying drawings Be will be apparent in stating and be readily appreciated that, wherein:
Fig. 1 shows a kind of method for determining junk information according to an embodiment of the invention Flow chart;
Fig. 2 shows in accordance with another embodiment of the present invention for determining the method for junk information Flow chart;
Fig. 3 shows a kind of device for determining junk information according to an embodiment of the invention Schematic diagram;
Fig. 4 shows a kind of high in the clouds for determining junk information according to an embodiment of the invention The schematic diagram of server.
Specific embodiment
Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, its In from start to finish same or similar label represent same or similar element or with same or like The element of function.Embodiment below with reference to Description of Drawings is exemplary, is only used for explaining this Invention, and be not construed as limiting the claims.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singulative used herein " one ", " one ", " described " and " being somebody's turn to do " may also comprise plural form.It is to be further understood that Used in specification of the invention wording " including " refer to the presence of the feature, integer, step, behaviour Make, element and/or component, but it is not excluded that in the presence of or add one or more other features, whole Number, step, operation, element, component and/or their group.It should be understood that when we claim element It is " connected " or during " coupled " to another element, it can be directly connected or coupled to other elements, or Can also there is intermediary element in person.Additionally, " connection " used herein or " coupling " can be included wirelessly Connection or wireless coupling.Wording "and/or" used herein includes one or more associated listing The whole or any cell of item and all combination.
Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (including technical term and scientific terminology), with art of the present invention in those of ordinary skill General understanding identical meaning.It should also be understood that those arts defined in such as general dictionary Language, it should be understood that with the meaning that the meaning in the context with prior art is consistent, and remove It is non-as here by specific definitions, will not otherwise be explained with idealization or excessively formal implication.
Fig. 1 shows a kind of method for determining junk information according to an embodiment of the invention Flow chart.As shown in figure 1, the method includes:
Step S110, receives from outside information, determines the information source and content of information;
Step S120, judges whether information is junk information according to information source, when information source judges to believe When breath is not junk information, the content according to information judges whether information is junk information;
Step S130, will be judged as that the information of junk information determines by the content of information source or information It is junk information.
In the present invention, to include but is not limited to short message, instant communication information etc. conventional or unconventional for information Information.Information source includes but is not limited to cell-phone number, Information ID etc. can be originated with beacon information Identifier.Without loss of generality and for convenience of description, hereinafter few examples are substituted with short message Information, cell-phone number alternative information source illustrate.It should be appreciated that this is only used for explaining the present invention, and It is not construed as limiting the claims.
Method shown in Fig. 1 describes to judge that information is according to information source first from the angle of client No is junk information, when information source cannot be defined as junk information, then is entered in itself by the information content Row judges.
Specifically, in step step S120, judge whether information is rubbish letter according to information source Breath includes:
The record in junk information source database by information source with local record is compared, and works as information When source is junk information source, information is defined as junk information;Or,
Information source is sent to cloud server, and receives the configured information that cloud server is returned, when When configured information determines information source for junk information source, information is defined as junk information.
The above method is on the basis of junk information identification validity is ensured, it is to avoid same without user The content that information is believed is uploaded directly into caused by server infringement individual subscriber in the case of meaning is hidden Private problem.
Furthermore, when information source cannot be defined as junk information, entered in itself by the information content Row judges.Specifically, the content according to information judges whether information is that junk information includes:
According to the selection of user, the content of information is directly uploaded to cloud server, or, will believe The alternative information of the content of breath is uploaded to cloud server;
The identification information that cloud server is returned is received, determines whether information is rubbish according to identification information Information.
For example, when user starts the client for carrying out refuse messages identification for the first time in mobile terminal When, ejection statement agreement asks the user whether to agree to directly by short message content in the statement agreement Pass to cloud server;If user have selected agreement, when mobile terminal receives short message, directly Connect and the content uploading of the short message is identified to cloud server.If user have selected disagreed, Then when mobile terminal receives short message, perform previously described by the alternative information of the content of information The step of reaching cloud server.The present embodiment is fundamentally solved existing from the wish of user Have and individual subscriber privacy, the problem of harm user information safety are invaded present in technology.
Specifically, the alternative information of the content of information is uploaded into cloud server includes:
Calculate the corresponding cryptographic Hash of content of information;
The corresponding cryptographic Hash of the content of information is uploaded into cloud server.
Furthermore, the corresponding cryptographic Hash of content for calculating information includes:
Content to information carries out word segmentation processing;
Different vector values are assigned to each word after participle, collects the content correspondence of the information of being calculated Simhash values.
For example, by taking short message as an example, with the corresponding cryptographic Hash of the content of short message as identification object, client The process interacted with cloud server, realizes the identification to refuse messages.The method is ensureing On the basis of refuse messages identification validity, it is to avoid in the case where agreeing to without user by short message Content is uploaded directly into the problem of the infringement individual subscriber privacy caused by server;And by local The place for interacting burden and cloud server of client and cloud server is alleviated after calculating treatment Reason pressure, improves recognition efficiency, meets user's request.
In one embodiment of the invention, the method shown in Fig. 1 is further included:
When it is determined that the short message is refuse messages, intercept process is carried out to the short message.Blocking here Cutting treatment can specifically include:Directly delete refuse messages;Or the short breath of rubbish is transferred to what is specified In file, the treatment of user is waited;Or the prompting of refuse messages is carried out to user.
In one embodiment of the invention, the described corresponding cryptographic Hash of the content for calculating the short message Refer to:It is the numerical value of regular length by the content map of the short message according to certain traditional hash algorithm, The numerical value is referred to as cryptographic Hash, and the cryptographic Hash is the unique of the content of the short message and extremely compact numerical tabular Show form.
Hash algorithm described in the present embodiment includes:HAVAL, MD2, MD4, MD5 or SHA1 Deng this quasi-tradition hash algorithm is all just like next essential characteristic:Seldom occurs hash in input domain Conflict, the i.e. text for possible gap only one of which byte can also map out two entirely different Kazakhstan Uncommon value.
For example, the content of two fraud text messages is respectively:It is " congratulations have suffered 50,000 yuan of Grand Prixs " and " respectful Like you and suffered 10,000 yuan of Grand Prixs ", the content correspondence for calculating this two short messages by traditional hash algorithm is breathed out Uncommon value is respectively 286 and 523.It can be seen that, two closely similar short messages of fraud tactics are calculated The cryptographic Hash for going out is entirely different, and the identification pressure of cloud server in subsequent treatment may be caused excessive.
Therefore, for the deviation between the content correspondence cryptographic Hash of the short message for removing small range difference, subtract The identification pressure of cloud server in light subsequent treatment.
In another embodiment of the present invention, calculate the short message content correspondence cryptographic Hash refer to: Calculate the corresponding simhash values of content of the short message.Detailed process is as follows:
Content to short message carries out word segmentation processing.
Different vector values are assigned to each word after participle, collects the content correspondence for being calculated short message Simhash values.
The solution of the present invention is described in detail by taking Tables 1 and 2 as an example below.
1 one processes of the corresponding simhash values of content of calculating short message of table
Table 1 shows corresponding according to the content of one specific embodiment calculating short message of the present invention The process of simhash values.As shown in table 1, in the present embodiment, the short message that mobile terminal is received It is:" our company opens common invoice generation, and generation does not open VAT invoice and professional invoice for our company.”
First, the corresponding vector form of initialization simhash values:A=Ao={ 0,0,0,0,0,0 }.
Then, the content to the short message carries out word segmentation processing:Our company/generation opens/and common/invoice/, The generation of our company/or not opens/value-added tax/special/invoice/and/specialty/invoice;Obtain each after participle Individual word is:Our company, Dai Kai, or not value-added tax is special, invoice, commonly, and, specialty.
According to certain traditional hash algorithm, corresponding 6 cryptographic Hash of each word are calculated respectively: Our company:100110, Dai Kai:110000, no:101111, value-added tax:110001, it is special: 010110, invoice:101011, commonly:110100, and:110110, specialty:001001.
The word frequency of each word is calculated again as corresponding vector value, represents each word in short message content Weight:Our company:2, Dai Kai:2, no:1, value-added tax:1, it is special:1, invoice:3, it is general It is logical:1, and:1, specialty:1.
Constitute a vector B:In our company/100110/2, generation, opens/110000/2, not/101111/1, Value-added tax/110001/1, special/010110/1, invoice/101011/3, common/110100/1, and / 110110/1, specialty/001001/1 }.
Each word in vectorial B is processed successively, processing mode is as follows:For each word, such as It is really " 1 " that then the i-th dimension to vectorial A adds the word frequency of the word in the i-th bit of its cryptographic Hash; If being in the i-th bit of its cryptographic Hash " 0 ", the i-th dimension to vectorial A subtracts the word frequency of the word. For example, for our company/100110/2, vectorial A is changed into { 2, -2, -2,2,2, -2 };Opened for generation / 110000/2, vectorial A are changed into { 2,2, -2, -2, -2, -2 };By that analogy, each word is obtained corresponding Vectorial A, as shown in table 1.
The corresponding vectorial A of each word is carried out collecting summation, vectorial Atotal=is obtained { 9, -1, -3,1,5,1 }, if the vectorial i-th dimension is negative, make simhash values correspondence to The i-th dimension of amount is " 1 ";If the vectorial i-th dimension is negative, simhash values correspondence is made The i-th dimension of vector is " 0 ";Final simhash values correspondence vector Afinal=is obtained accordingly {1,0,0,1,1,1}。
Therefore, short message " our company opens common invoice in generation, our company not generation open VAT invoice and Professional invoice." simhash values be 100111.
Table 2 another calculate short message the corresponding simhash values of content process
Table 2 shows corresponding according to the content of another specific embodiment calculating short message of the invention The process of simhash values.As shown in table 2, in the present embodiment, the short message that mobile terminal is received It is:" our company opens common invoice generation, and generation does not open special invoice and professional invoice for our company." its The calculating process of simhash values similarly in table 1, will not be repeated here.It can be seen that, show in table 2 Vectorial Atotal={ 8, -2, -2,0,6,0 } is obtained after having gone out to collect summation, final simhash values are obtained Correspondence vector Afinal={ 1,0,0,1,1,1 }.Therefore, " our company opens common invoice in generation to short message, originally In generation, does not open special invoice and professional invoice for company." simhash values be 100111, with short message sheet Company's Dai Kai common invoices, in generation, does not open VAT invoice and professional invoice for our company." Simhash values are identical.
From the foregoing, in the calculating process of simhash values, on the basis of each word weight is kept, Gradually ignore the specific size of the cryptographic Hash of each word, but with the positive and negative next of value after being collected summation Distinguish each word, and often similar short message content to obtain symbol with its similar text structure identical Summation vector Atotal, therefore, similar short message can have identical simhash values, overcome The hash problem of traditional hash algorithm.
Additionally, in other embodiments of the invention, can by other means to each after participle Word assigns vector value.
As embodiments of the invention, when certain information is defined as junk information, local or high in the clouds clothes Junk information source database on business device records the information source of the information.It is, the rubbish in the present invention Rubbish database of information sources will continuously record information source that is new, being classified as junk information.
Furthermore, the rubbish for being recorded on the junk information source database of local record and cloud server Rubbish database of information sources interacts renewal.
Therefore, by taking terminal device mobile phone as an example, when frequent externally sending rubbish short message, the hand of the mobile phone Machine number will be included into junk information source database quickly.By the propagation of internet, and client The local junk information source database in end is obtained after real-time update, when mobile phone continuation transmission rubbish is short During letter, the very first time is blocked by the client in junk information source by other identified its, is shielded. Even if client receives the junk information of mobile phone transmission, can also be known according to information source the very first time It is not out junk information, and without the content in analysis information.
It should be appreciated that present invention method disclosed above, although be described with the angle of client, But its partial function can also be performed in server end, the technical scheme of this part is it will be also be appreciated that originally The category of disclosure of the invention.
Fig. 2 shows a kind of method for determining junk information in accordance with another embodiment of the present invention Flow chart.As shown in Fig. 2 the method includes:
Step S210, receives the number of the transmission short message that client-side is uploaded or the content correspondence of short message Cryptographic Hash.
Step S220, will send the corresponding cryptographic Hash of the content of number or the short message and Hash of short message Value storehouse is matched.
In this step, the junk information source database that cloud server is recognized according to historical record, sentences Whether whether the number of the short message of disconnected upper hair has been recorded, i.e., be present in junk information source database In.
Or, in this step, correspondence preserves the corresponding Hash of different short message contents in cryptographic Hash storehouse Value and black or white identification information has been judged as it.Wherein, black identification information be indicate the information be Junk information;White identification information is to indicate the information to be not junk information.
In one embodiment of the invention, the cryptographic Hash storehouse is that cloud server recognizes note according to history Set by record, after cloud server carries out the identification of refuse messages every time, which kind of identification no matter chosen The features such as method, the content of the short message that will be recognized, keyword or cryptographic Hash are corresponding with identification information Record, take the corresponding record of cryptographic Hash therein and identification information, set up cryptographic Hash storehouse.
Step S230, client is returned to by identification information.
It can be seen that, the method shown in Fig. 2 describes cloud server and receives the transmission that client is sent After the number of short message or the content correspondence cryptographic Hash of short message, identification information is returned to the mistake of client Journey.The method is on the basis of refuse messages identification validity is ensured, it is to avoid agree to without user In the case of the content of short message is uploaded directly into infringement individual subscriber privacy caused by server Problem;And by alleviating the processing pressure and high in the clouds clothes of cloud server after local calculating treatment Business device interacts burden with client, improves recognition efficiency, meets user's request.
In one embodiment of the invention, by taking terminal device mobile phone as an example, when often to outgoing rubbish During short message, the phone number of the mobile phone will be included into junk information source data by cloud server quickly Storehouse.When client receives the junk information of mobile phone transmission, the cloud server meeting very first time is according to letter Breath source and to be identified be junk information, and the content in short message need not be analyzed.
In one embodiment of the invention, the content of the short message received by cloud server is corresponding Cryptographic Hash is the corresponding simhash values of content of the short message, correspondingly, the Kazakhstan of cloud server Uncommon value storehouse is specially simhash values storehouse.
In one embodiment of the invention, the method shown in Fig. 2 is further included:
Step S240 (not shown)s, receive the short message content of user's report.
Step S250 (not shown)s, are carried out black or white to each short message content of user's report Identification, and corresponding simhash values are generated, simhash values and corresponding identification information are saved in In cryptographic Hash storehouse.
In this step, calculate the content correspondence simhash values of short message by cloud server, its process and The process that the above client calculates simhash is similar to, and will not be repeated here.
Fig. 3 shows a kind of device for determining junk information according to an embodiment of the invention Schematic diagram.As shown in figure 3, determine the device 300 of junk information including:
Receiver module 310, for receiving from outside information, determines the information source and content of information;
Processing module 320, for judging whether information is junk information according to information source, works as information source When judgement information is not junk information, for judging whether information is rubbish letter according to the content of information Breath;
Determination module 330, for the letter by junk information is judged as by the content of information source or information Breath is defined as junk information.
Used as the embodiment of the device 300 for determining junk information, processing module 320 is used for according to information Source judges whether information is that junk information includes:
The note that processing module 320 is used in the junk information source database by information source with local record Record is compared, and when information source is junk information source, information is defined as rubbish by determination module 330 Information;Or,
Processing module 320 is used to for information source to be sent to cloud server, and receiver module 310 is used to connect The configured information that cloud server is returned is received, when it is junk information source that configured information determines information source, Information is defined as junk information by determination module 330.
Furthermore, processing module 320 is used to judge whether information is rubbish according to the content of information Information includes:
According to the selection of user, processing module 320 is used to for the content of information to be directly uploaded to high in the clouds clothes Business device, or, processing module 320 is used to for the alternative information of the content of information to be uploaded to cloud service Device;
Receiver module 310 is used to receive the identification information of cloud server return, and determination module 330 is used In determining whether information is junk information according to identification information.
Used as the embodiment of the device 300 for determining junk information, processing module 320 is used for information The alternative information of content is uploaded to cloud server to be included:
Processing module 320 is used to calculate the corresponding cryptographic Hash of content of information;
Processing module 320 is used to for the corresponding cryptographic Hash of the content of information to upload to cloud server.
Furthermore, processing module 320 is used to calculate the corresponding cryptographic Hash bag of content of information Include:
Processing module 320 is used to carry out word segmentation processing to the content of information;
Processing module 320 is used to assign different vector values to each word after participle, collects and calculates To the corresponding simhash values of content of information.
Be sent to for the content correspondence cryptographic Hash of the information source of the information of transmission or information by processing module 320 Identification information is returned to receiver module 310 by cloud server, cloud server after judging.Therefore, Determine the device 300 of junk information on the basis of refuse messages identification validity is ensured, it is to avoid The content of short message is uploaded directly into infringement caused by server in the case of agreeing to without user to use The problem of family individual privacy;And by alleviating the treatment pressure of cloud server after local calculating treatment Power and cloud server interact burden with client, improve recognition efficiency, meet user's request.
In one embodiment of the invention, processing module 320 is suitable to be calculated according to certain traditional Hash Method, calculates the corresponding cryptographic Hash of content of short message.Hash algorithm described in the present embodiment includes: HAVAL, MD2, MD4, MD5 or SHA1 etc., from the foregoing, it can be understood that this quasi-tradition Hash Algorithm is all just like next essential characteristic:Seldom there is hash collision in input domain, i.e., for possible The text of gap only one of which byte can also map out two entirely different cryptographic Hash.
Therefore, for the deviation between the content correspondence cryptographic Hash of the short message for removing small range difference, subtract The identification pressure of cloud server in light subsequent treatment, in another embodiment of the present invention, treatment Module 320 is suitable to carry out word segmentation processing to the content of short message;Each word after participle is assigned different Vector value, collects the corresponding simhash values of content for being calculated the short message.Wherein, processing module One specific embodiment of the 320 corresponding simhash values of content for calculating short message is as shown in table 1, on Described in detail in text, will not be repeated here.
Additionally, when it is junk information that determination module 330 determines information, on local or cloud server Junk information source database record information information source.
Furthermore, the junk information source that receiver module 310 and processing module 320 will be locally recorded Database interacts renewal with the junk information source database of record on cloud server.
Therefore, by taking terminal device mobile phone as an example, when frequent externally sending rubbish short message, by internet Propagation, the device 300 that the phone number of the mobile phone will quickly be determined junk information included into rubbish Database of information sources.When the mobile phone continues to send refuse messages, the very first time is known by other It is not blocked by the client in junk information source, shielded.Even if determining the device 300 of junk information The junk information of mobile phone transmission is received, it is rubbish that can be also identified according to information source the very first time Rubbish information, and without the content in analysis information.
Fig. 4 shows a kind of cloud service for determining junk information according to an embodiment of the invention The schematic diagram of device.
As shown in figure 4, determine the cloud server 400 of junk information including:
Receiving unit 410, in the number or short message that receive the transmission short message of client-side upload Hold corresponding cryptographic Hash.
Recognition unit 420, for the corresponding cryptographic Hash of the content of number or the short message by short message is sent Matched with cryptographic Hash storehouse.
In this unit, the junk information source database that cloud server is recognized according to historical record is sentenced Whether whether the number of the short message of disconnected upper hair has been recorded, i.e., be present in junk information source database In.
Or, in this unit, correspondence preserves the corresponding Hash of different short message contents in cryptographic Hash storehouse Value and black or white identification information has been judged as it.In one embodiment of the invention, the cryptographic Hash Storehouse be cloud server 400 according to set by history identification record, cloud server 400 is each After carrying out the identification of refuse messages, which kind of recognition methods no matter is chosen, in the short message that will be recognized Hold, the feature such as keyword or cryptographic Hash is got off with identification information corresponding record, take cryptographic Hash therein and The corresponding record of identification information, sets up cryptographic Hash storehouse.
Feedback unit 430, for identification information to be returned into client.
It can be seen that, the scheme shown in Fig. 4 illustrates that receiving unit 410 receives the hair that client is sent After sending the number of short message or the content correspondence cryptographic Hash of short message, feedback unit 430 returns to identification information To the process of client.The program is on the basis of refuse messages identification validity is ensured, it is to avoid The content of short message is uploaded directly into infringement caused by server in the case of agreeing to without user to use The problem of family individual privacy;And by alleviating the treatment pressure of cloud server after local calculating treatment Power and cloud server interact burden with client, improve recognition efficiency, meet user's request.
In one embodiment of the invention, by taking terminal device mobile phone as an example, when often to outgoing rubbish During short message, the phone number of the mobile phone will quickly be identified unit 420 and include into junk information source data Storehouse.When client receive the mobile phone transmission junk information, recognition unit 420 can the very first time according to Information source and to be identified be junk information, and the content in short message need not be analyzed.
In one embodiment of the invention, the content pair of the short message received by receiving unit 410 The cryptographic Hash answered is the corresponding simhash values of content of the short message, correspondingly, cloud server Cryptographic Hash storehouse be specially simhash values storehouse.
In one embodiment of the invention, receiving unit 410, are further adapted for receiving user's report Short message content;Recognition unit 420, is further adapted for carrying out each short message content of user's report black Or white identification, and corresponding simhash values are generated, by simhash values and corresponding identification information It is saved in the cryptographic Hash storehouse.Wherein, recognition unit 420 calculates the content correspondence simhash of short message Value, its process is similar with the process that the above client calculates simhash, no longer goes to live in the household of one's in-laws on getting married herein State.
Those skilled in the art of the present technique are appreciated that the present invention includes being related to for performing institute in the application The equipment for stating one or more in operation.These equipment can be for needed for purpose and specially design and Manufacture, or the known device in all-purpose computer can also be included.These equipment have storage at it Interior computer program, these computer programs are optionally activated or reconstructed.Such computer journey Sequence can be stored in equipment (for example, computer) computer-readable recording medium or store and be suitable to storage electricity Sub-instructions are simultaneously coupled in any kind of medium of bus respectively, and the computer-readable medium includes But be not limited to any kind of disk (including floppy disk, hard disk, CD, CD-ROM and magneto-optic disk), ROM (Read-Only Memory, read-only storage), RAM (Random Access Memory, Memory immediately), (Erasable Programmable Read-Only Memory, can for EPROM Erasable programmable read only memory), EEPROM (Electrically Erasable Programmable Read-Only Memory, EEPROM), flash memory, magnetic card or light Card.It is, computer-readable recording medium includes being deposited in the form of it can read by equipment (for example, computer) Storage or any medium of transmission information.
Those skilled in the art of the present technique are appreciated that can realize that these are tied with computer program instructions Each frame and these structure charts and/or block diagram and/or flow graph in composition and/or block diagram and/or flow graph In frame combination.Those skilled in the art of the present technique are appreciated that can be referred to these computer programs Order is supplied to the processor of all-purpose computer, special purpose computer or other programmable data processing methods Realize, so as to perform the present invention by the processor of computer or other programmable data processing methods The scheme specified in the frame or multiple frames of disclosed structure chart and/or block diagram and/or flow graph.
Those skilled in the art of the present technique are appreciated that various operations, the side discussed in the present invention Step, measure, scheme in method, flow can be replaced, changed, combined or deleted.Further Ground, with other steps in various operations, method, the flow discussed in the present invention, arranges Apply, scheme can also be replaced, changed, reset, decompose, combines or be deleted.Further, it is existing Have in technology with various operations, method, the flow disclosed in the present invention in step, measure, Scheme can also be replaced, changed, reset, decomposed, combined or be deleted.
The above is only some embodiments of the invention, it is noted that for the art For those of ordinary skill, under the premise without departing from the principles of the invention, some improvement can also be made And retouching, these improvements and modifications also should be regarded as protection scope of the present invention.

Claims (10)

1. a kind of method for determining junk information, it is characterised in that including:
Receive from outside information, determine the information source and content of described information;
Judge whether described information is junk information according to described information source, when described information source judges institute When stating information and being not junk information, the content according to described information judges whether described information is rubbish letter Breath;
To be judged as that the described information of junk information is true by the content in described information source or described information It is set to junk information.
2. method according to claim 1, it is characterised in that judged according to described information source Whether described information is that junk information includes:
The record in junk information source database by described information source with local record is compared, when When described information source is junk information source, described information is defined as junk information;Or,
Described information source is sent to cloud server, and receives the instruction that the cloud server is returned Information, when it is junk information source that the configured information determines described information source, described information is defined as Junk information.
3. according to one of any described method of claim 1 or 2, it is characterised in that according to institute The content for stating information judges whether described information is that junk information includes:
According to the selection of user, the content of described information is directly uploaded to cloud server, or, The alternative information of the content of described information is uploaded to cloud server;
The identification information that the cloud server is returned is received, the letter is determined according to the identification information Whether breath is junk information.
4. method according to claim 3, it is characterised in that by the content of described information Alternative information is uploaded to cloud server to be included:
Calculate the corresponding cryptographic Hash of content of described information;
The corresponding cryptographic Hash of the content of described information is uploaded into cloud server.
5. method according to claim 4, it is characterised in that calculate the interior of described information Holding corresponding cryptographic Hash includes:
Content to described information carries out word segmentation processing;
Different vector values are assigned to each word after participle, collects the content for being calculated described information Corresponding simhash values.
6. method according to claim 3, it is characterised in that when described information is defined as rubbish During rubbish information, junk information source database record described information on the local or cloud server Information source.
7. method according to claim 6, it is characterised in that the junk information of local record Source database interacts renewal with the junk information source database of record on the cloud server.
8. a kind of device for determining junk information, it is characterised in that including:
Receiver module, for receiving from outside information, determines the information source and content of described information;
Processing module, for judging whether described information is junk information according to described information source, works as institute When stating information source and judging that described information is not junk information, for judging institute according to the content of described information State whether information is junk information;
Determination module, for junk information will to be judged as by the content in described information source or described information Described information be defined as junk information.
9. device according to claim 8, it is characterised in that the processing module is used for root Judge whether described information is that junk information includes according to described information source:
The processing module is used in the junk information source database by described information source with local record Record compare, when described information source is junk information source, the determination module is by the letter Breath is defined as junk information;Or,
The processing module is used to for described information source to be sent to cloud server, and the receiver module is used In the configured information that the cloud server is returned is received, when the configured information determines described information source During for junk information source, described information is defined as junk information by the determination module.
10. device according to claim 8 or claim 9, it is characterised in that the processing module is used In judging whether described information is that junk information includes according to the content of described information:
According to the selection of user, the processing module is used to for the content of described information to be directly uploaded to cloud End server, or, the processing module is used to be uploaded to the alternative information of the content of described information Cloud server;
The receiver module is used to receive the identification information that the cloud server is returned, the judgement mould Block is used to determine whether described information is junk information according to the identification information.
CN201510927718.XA 2015-12-14 2015-12-14 method and device for determining junk information Pending CN106878962A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510927718.XA CN106878962A (en) 2015-12-14 2015-12-14 method and device for determining junk information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510927718.XA CN106878962A (en) 2015-12-14 2015-12-14 method and device for determining junk information

Publications (1)

Publication Number Publication Date
CN106878962A true CN106878962A (en) 2017-06-20

Family

ID=59178614

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510927718.XA Pending CN106878962A (en) 2015-12-14 2015-12-14 method and device for determining junk information

Country Status (1)

Country Link
CN (1) CN106878962A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050144279A1 (en) * 2003-12-31 2005-06-30 Wexelblat David E. Transactional white-listing for electronic communications
CN104254074A (en) * 2013-06-28 2014-12-31 腾讯科技(深圳)有限公司 Method and device for intercepting spam short messages

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050144279A1 (en) * 2003-12-31 2005-06-30 Wexelblat David E. Transactional white-listing for electronic communications
CN104254074A (en) * 2013-06-28 2014-12-31 腾讯科技(深圳)有限公司 Method and device for intercepting spam short messages

Similar Documents

Publication Publication Date Title
RU2708508C1 (en) Method and a computing device for detecting suspicious users in messaging systems
US8549642B2 (en) Method and system for using spam e-mail honeypots to identify potential malware containing e-mails
CN104640092B (en) Identify the method for refuse messages, client, cloud server and system
CN104378283B (en) A kind of sensitive mail filtering system and method based on customer end/server mode
US9614866B2 (en) System, method and computer program product for sending information extracted from a potentially unwanted data sample to generate a signature
US8886664B2 (en) Decreasing duplicates and loops in an activity record
CN104184653B (en) A kind of method and apparatus of message screening
CN104794170A (en) Network evidence taking content tracing method based on multiple fingerprint Hash bloom filters
WO2016082568A1 (en) Short message safe processing method and apparatus
CN101141416A (en) Real-time rubbish mail filtering method and system used for transmission influx stage
CN111752973A (en) System and method for generating heuristic rules for identifying spam e-mails
US20220109621A1 (en) IP-Based Matching System
WO2016177148A1 (en) Short message interception method and device
KR20180089479A (en) User data sharing method and device
CN114169438A (en) Telecommunication network fraud identification method, device, equipment and storage medium
CN107172622A (en) The identification of pseudo-base station note and analysis method, apparatus and system
CN117252429A (en) Risk user identification method and device, storage medium and electronic equipment
WO2016037489A1 (en) Method, device and system for monitoring rcs spam messages
CN106878962A (en) method and device for determining junk information
CN106878994A (en) method and device for determining junk information
Belém et al. Content filtering for SMS systems based on Bayesian classifier and word grouping
Altuncu et al. Deep learning based DNS tunneling detection and blocking system
CN110808978B (en) Real name authentication method and device
CN106911660B (en) Information management method and device
CN106982304A (en) A kind of score information matching process and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170620