CN104427503B - A kind of information filtering method and device - Google Patents
A kind of information filtering method and device Download PDFInfo
- Publication number
- CN104427503B CN104427503B CN201310403218.7A CN201310403218A CN104427503B CN 104427503 B CN104427503 B CN 104427503B CN 201310403218 A CN201310403218 A CN 201310403218A CN 104427503 B CN104427503 B CN 104427503B
- Authority
- CN
- China
- Prior art keywords
- account
- opposite end
- communicated
- group
- core
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/02—Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
- H04L63/0227—Filtering policies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W12/00—Security arrangements; Authentication; Protecting privacy or anonymity
- H04W12/12—Detection or prevention of fraud
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/12—Messaging; Mailboxes; Announcements
Abstract
The invention discloses a kind of information filtering method, the method includes:The corresponding core contacts group of the account and non-core contacts group are determined according to the communications records of each account;The reputation value of the account is determined according to communications records of each member in the communications records of each member in core contacts group and the non-core contacts group;Determine whether the account is rubbish account according to the reputation value of the account.The present invention also discloses a kind of information filtrating devices.Technical solution using the present invention is intercepted by the information sent to the rubbish account, to ensure that the recall ratio and precision ratio of junk information.
Description
Technical field
The present invention relates to Information Filtering Technology more particularly to a kind of information filtering methods and device.
Background technology
Spreading unchecked for the information such as refuse messages, advertisement is a significant problem for perplexing user and mobile communication carrier, because
This is necessary to be identified and filters to information.The widely used information filtering means of operator can be divided mainly into user at present
The filtering of the filtering and infobit rank of rank, process object correspond to the letter that single information sends user and single respectively
Breath.
The filter method of user class includes mainly:Filter method based on frequency threshold and black and white lists method, wherein
Filter method based on frequency threshold is setpoint frequency threshold value during information intercepting, the setting of frequency threshold rely primarily on through
It tests, therefore, subjective factor can cause a large amount of erroneous judgements of legal short message and failing to judge for information, not be apparent so as to cause effect.
Black and white lists method overcomes the defect of the filter method based on frequency threshold, but black and white lists need in black and white lists method
It safeguards by hand, it is time-consuming and laborious, and also during use, the quantity of black and white lists can be more and more huger, to influence short message
The promptness of transmission.
The filter method of infobit rank includes mainly keyword filtration method, and keyword filtration method is not firstly the need of
Dictionary is updated disconnectedly, but since keyword selection is difficult, it cannot be guaranteed that filtering out all rubbish keywords;Moreover, passing through
Keyword match is difficult the content legality for judging short message, it is easy to cause to judge by accident, and by using phonetic, apply mechanically it is wrong not
Word, phonetically similar word and in the information between plus the methods of many symbols can easily bypass Keyword List.Therefore, it is urgent to provide
The recall ratio and precision ratio of a kind of junk information high information filtering method and device.
Invention content
In view of this, the main purpose of the present invention is to provide a kind of information filtering method and device, by rubbish account
Number send information intercepted, to ensure that the recall ratio and precision ratio of junk information.
In order to achieve the above objectives, the technical proposal of the invention is realized in this way:
A kind of information filtering method provided by the invention, the method includes:It is determined according to the communications records of each account
The corresponding core contacts group of the account and non-core contacts group;The communications records that each member in group is associated according to the core, with
And the communications records of each member in the non-core contacts group determine the reputation value of the account;It is determined according to the reputation value of the account
Whether the account is rubbish account.
In said program, the communications records according to each account determine the corresponding core contacts group of the account and non-core
The heart associates group, including:For each account A:Determine the opposite end account Ap communicated with account A, wherein 0≤p≤N, N are
With the sum of the account A accounts communicated;It determines and whether there is correspondence between the opposite end account of the account A;
Opposite end account with correspondence between the account of opposite end is determined as to the core contacts group of the account A;Conversely, by opposite end account
Number it is determined as the non-core contacts group of the account A.
In said program, it whether there is correspondence between the opposite end account of the determination account A, including:Judge
Relationship between the number and default communication threshold that are communicated between the account of arbitrary two opposite end;When judging result is arbitrary two pairs
When the number communicated between the account of end is more than or equal to the communication threshold, mutually presence is logical between determining the two opposite ends account
Gateway system;Conversely, correspondence is not present between determining the two opposite ends account.
In said program, before determining the corresponding core contacts group of the account according to the communications records of each account, institute
The method of stating further includes:Conversion and load ETL are extracted by data, and/or black and white lists method determines the communication note of each account
Record.
In said program, the communications records of each member in group are associated according to the core, and should
The communications records of each member in non-core contacts group determine the reputation value R of the account(n+1)(A), including:Wherein, R(n+1)(A) the reputation value of account A in (n+1)th iteration is indicated;Ap is
The opposite end account communicated with account A, 0≤p≤N, N are and the sum of the account A opposite end accounts communicated;ω tables
Show damping coefficient;Iteration initial value R(0)(A)=1;τ (A, Ap) is that opposite end account Ap evaluates the degree of belief of account A;
When the opposite end account Ap communicated with account A belongs to the core contacts group of account A, account is determined using following formula
Degree of belief number between A and opposite end account Ap:Wherein, R (Ap) indicates the reputation value of opposite end account Ap,
R (Ap)=1, k indicates the quantity for the account that opposite end account Ap is communicated with the account in its core contacts group;When with account A
When the opposite end account Ap communicated is not belonging to the core contacts group of account A, account A and opposite end account Ap are determined using following formula
Between degree of belief:Wherein, R (Ap) indicates the reputation value of opposite end account Ap, R (Ap)=1;H tables
Show the quantity for the account that opposite end account Ap is communicated with the account in its non-core contacts group;L indicates opposite end account Ap and account
The number that number A is communicated;M indicates the relationship of other opposite ends account Ap of opposite end account Ap and account A:The contacts of m and account A
Group is related with the contacts public number f of contacts account of group of opposite end account Ap, and m=f+1.
In said program, the method further includes:Intercept the information of the transmission of the rubbish account.
The present invention also provides a kind of information filtrating device, described device includes the first determination unit, the second determination unit
With third determination unit, wherein first determination unit, for determining that the account corresponds to according to the communications records of each account
Core contacts group and non-core contacts group;Second determination unit, for associating each member's in group according to the core
Communications records and the communications records of each member in the non-core contacts group determine the reputation value of the account;The third is true
Order member, for determining whether the account is rubbish account according to the reputation value of the account.
In said program, first determination unit further comprise the first determination subelement, the second determination subelement and
Third determination subelement, wherein first determination subelement, for determining the opposite end account Ap communicated with account A,
In, 0≤p≤N, N are and the sum of the account A accounts communicated;Second determination subelement, described in determining
It whether there is correspondence between the opposite end account of account;The third determination subelement, for will have between the account of opposite end
The opposite end account of correspondence is determined as the core contacts group of the account A;Conversely, opposite end account is determined as the account A
Non-core contacts group.
In said program, second determination subelement further comprises setup module, judgment module and determining module,
In, the setup module, the communication threshold for presetting number of communications between opposite end account two-by-two;The judgment module,
The relationship between number and the communication threshold for judging to be communicated between the account of arbitrary two opposite end;The determining mould
Block, when the number for being communicated between the account of arbitrary two opposite end when judging result is more than or equal to the communication threshold, really
Mutually there is correspondence between the fixed two opposite ends account;Conversely, correspondence is not present between determining the two opposite ends account.
In said program, described device further comprises the 4th determination unit and interception unit, wherein the described 4th determines
Unit, the communications records for determining each account by data extraction conversion and load ETL, and/or black and white lists method;
The interception unit, the information of the transmission for intercepting the rubbish account.
Information filtering method and device provided by the invention determine that the account is corresponding according to the communications records of each account
Core associates group and non-core contacts group;Communications records and the non-core friendship of each member in group are associated according to the core
The communications records of each member into group determine the reputation value of the account;According to the reputation value of the account determine the account whether be
Rubbish account;It is intercepted by the information sent to the rubbish account, to ensure that the recall ratio of junk information and look into
Quasi- rate;
Further, the present invention determines the member of the corresponding core contacts group of the account by the communications records of account,
Each account in core contacts group determines by least there are communications records between each other there are three account, so as to avoid
Due to the deviation that conventional method is arranged due to frequency threshold, and then it is incorrect to cause core contacts mass selection to be selected;
Further, the present invention will determine that core contacts group and reputation value are introduced into cloud computing platform, can save in this way
Storage resource and computing resource make it realize distributed parallel in cloud platform by the distribution joint account model of cloud computing
It calculates, to ensure that efficient processing.
Description of the drawings
Fig. 1 is the implementation process schematic diagram of information filtering method of the embodiment of the present invention;
Fig. 2 is the specific implementation flow schematic diagram of step 102 in Fig. 1;
Fig. 3 is the structural schematic diagram that group is associated in the embodiment of the present invention;
Fig. 4 is the composed structure schematic diagram of information filtrating device of the embodiment of the present invention;
Fig. 5 is the composed structure schematic diagram of the first determination unit in Fig. 4;
Fig. 6 is the composed structure schematic diagram of the second determination subelement in Fig. 5;
Fig. 7 is the composed structure schematic diagram of information filtrating device when the embodiment of the present invention uses cloud computing platform.
Specific implementation mode
The present invention basic thought be:First the corresponding core contacts group of the account is determined according to the communications records of each account
With non-core contacts group;According to each in the communications records of each member in core contacts group and the non-core contacts group
The communications records of member determine the reputation value of the account;Determine whether the account is rubbish account according to the reputation value of the account;
It is intercepted by the information sent to the rubbish account, to ensure that the recall ratio and precision ratio of junk information.
The technical solution of the present invention is further elaborated in the following with reference to the drawings and specific embodiments.
Fig. 1 is the implementation process schematic diagram of information filtering method of the embodiment of the present invention, as shown in Figure 1, the embodiment of the present invention
Information filtering method includes:
Step 101:Determine the communications records of each account;
Here, the communications records include:Each opposite end account of each account, time communicated with each opposite end account
Number and at the beginning of communicating each time, end time and duration etc..
Here, the communications records of each account of the determination mainly delete some unnecessary records.Such as:This hair
Communications records in bright embodiment are recorded if short/color, are sent in those SMS/MMSs record then step 101 can delete
The message etc. that the record or even user that the record and business system of oneself are sent are subscribed to;For another example, in the embodiment of the present invention
Communications records if telephony recording, then step 101 can delete those and the message registration of operator and user oneself
Another account and the account between message registration etc..
Here, the method for the communications records of each account of the determination, including:Data extraction conversion and load (ETL,
Extraction Transformation Loading) method, and/or black and white lists method.Those skilled in the art can be with
The communications records of the account are filtered according to various existing ETL methods and black and white lists method, which is not described herein again.
Step 102:The corresponding core contacts group of the account and non-core contacts are determined according to the communications records of each account
Group;
Step 103:According to each in the communications records of each member in core contacts group and the non-core contacts group
The communications records of member determine the reputation value of the account;
Step 104:Determine whether the account is rubbish account according to the reputation value of the account;
Here, the reputation value according to the account determine the account whether be rubbish account realization process, this field
Technical staff decision tree may be used to realize.
Step 105:Intercept the information of the transmission of the rubbish account.
Further, Fig. 2 is the specific implementation flow schematic diagram of step 102 in Fig. 1, the communication according to each account
Record determines the corresponding core contacts group of the account and non-core contacts group, including:
Step 201, the opposite end account Ap communicated with account A is determined;
Here, the opposite end account Ap includes the account A by the account breathed out when caller account and the account
For A to be called the account of account when institute incoming call, the account Ap forms the contacts group of the account A;P is more than or equal to 1 and to be less than
Natural number equal to N, 1≤p≤N, N are and the sum of the account A opposite end accounts communicated;
Step 202, it determines and whether there is correspondence between the opposite end account of the account A;
Here, in order to more determine the correspondence between the account of arbitrary two opposite end, opposite end account two-by-two can be preset
The communication threshold of number of communications is carried out between number;Then judge the number communicated between the account of opposite end two-by-two with it is described logical
Believe the relationship between threshold value;When the number that judging result is communicated between this two-by-two opposite end account is more than or equal to the communication
When threshold value, determine between the two opposite ends account mutually there is correspondence;Conversely, between determining the two opposite ends account, there is no logical
Gateway system.
Step 203, the opposite end account with correspondence between the account of opposite end is determined as to the core contacts of the account A
Group;Conversely, opposite end account to be determined as to the non-core contacts group of the account A.
Fig. 3 is the structural schematic diagram that group is associated in the embodiment of the present invention, as shown in figure 3, the account A and its opposite end account
Correspondence between Ap indicates with solid line, and the opposite end account one of the account A shares 8, respectively opposite end account A1,
A2, A3, A4, A5, A6, A7, A8, two opposite end accounts being in communication with each other in 1≤p≤8, Fig. 3 are also indicated with solid line, wherein
Between opposite end account A1 and opposite end account A2, between opposite end account A3 and opposite end account A4, opposite end account A5 and opposite end account A6
Between and opposite end account A3 and opposite end account A6 between all there is correspondence, correspondence described here refers to right two-by-two
The number communicated between the account of end is more than or equal to set communication threshold;Accordingly, it is determined that opposite end account A1, A2, A3,
A4, A5, A6 are the account of the core contacts group of account A;And opposite end account A7 and A8 are only communicated with account A, and and its
It is not communicated between his opposite end account, alternatively, time communicated between opposite end account A7 and A8 and other opposite end accounts
Number is less than set communication threshold;Therefore, opposite end account A7 and A8 cannot constitute the core of account A and associate group, and constitute account
The non-core contacts group of number A.
Such as:Assuming that the contacts group of opposite end account A7 distinguishes A8 and A1 with public contacts account in the contacts group of account A,
The number of communications between number of communications and A7 and A1 between middle A7 and A8 is both less than set communication threshold, it is seen then that though
There are 2 public contacts accounts in the right contacts group of opposite end account A7 and the contacts group of account A, but opposite end account A7 and is not belonging to account
The core of number A associates group.The communication information in embodiment shown in Fig. 3 is the business datum of interactive, the industry of this kind of interactive
Business data ensure that contacts group and the credibility of core contacts group of account.
In actual application process, the account is generally determined using distributed computation model or cloud computing platform
Number corresponding core associates group, here, by taking cloud computing platform as an example, to illustrate that step 102 uses cloud meter in the embodiment of the present invention
When the distribution of calculation and pooled model, the communications records according to each account determine the corresponding core contacts group institute of the account
Include the steps that.
Step 401, using account as key assignments, the communications records of each account are write as data line;
Specifically, by account A communications records according to<Account A, opposite end account A1 ... Ap ..., AN>Format write-in text
Part becomes the data line in the file, wherein 1≤p≤N, N are the sum of the opposite end account communicated with account A;
And
By the communications records of the opposite end account Ap of account A also according to<Account Ap, opposite end account Ap1 ... Aps ..., Apt>
Format file is written, become the data line in the file, wherein 1≤s≤T, T be with the opposite end account Ap of account A into
The sum of the opposite end account of row communication;
Step 402, in the distribution model of cloud computing, using account as key assignments, by the opposite end account of the account and
The opposite-terminal number of opposite end number is distributed in pooled model;
Such as:In the distribution model of cloud computing, when account is A, the opposite end account Ap of the account A is distributed to conjunction
And in model, wherein 1≤p≤N, N are and the sum of the account A opposite end accounts communicated;
For another example:In the distribution model of cloud computing, when account is Ap, the opposite end account Aps of the account Ap is distributed
Into pooled model, wherein 1≤s≤T, T are and the sum of the account Ap opposite end accounts communicated;
Step 403, each opposite end account Ap of account A is compared with the opposite end account Aps of each opposite end account respectively,
In, 1≤p≤N, 1≤s≤T, N are the sum of the opposite end account communicated with account A, and T is to be carried out with the opposite end account Ap
The sum of the opposite end account of communication;If Aps is identical as Ap, opposite end account Ap is classified as account A core contacts groups.
Preferably, in above-mentioned steps 103, the communications records of each member in group is associated according to the core and this is non-
The communications records of each member in core contacts group determine the reputation value of the account, realize that process is as follows:
In Fig. 3, each solid line in the contacts group of account A all indicates that the correspondence between account, this communication close
System also include the degree of belief τ (A, Ap) communicated between account A and account Ap, first describe below degree of belief τ (A,
Ap rule):
Rule one:When the opposite end account Ap communicated with account A belongs to the core contacts group of account A, then, account
Degree of belief between A and opposite end account Ap is a kind of positive correspondence, can give positive degree of belief, specifically, can be with
Degree of belief between account A and opposite end account Ap is determined using formula (1):
In formula (1), R (Ap) indicates that the reputation value of opposite end account Ap, R (Ap)=1, k indicate opposite end account Ap and its core
The quantity for the account that account in heart contacts group is communicated.
Continue to accept example shown in Fig. 3, such as:Assuming that the opposite end account communicated with account A is A1, in figure 3,
A1 is that the core of account A associates the account of group, that is to say, that the opposite end account A1 communicated with account A belongs to the core of account A
The heart associates group's account, simultaneously, it is assumed that the reputation value R (A1) of opposite end account A1 is 1, and opposite end account A1 is associated with its core in group
The quantity of account that is communicated of account be 200, then the letter between account A and opposite end account A1 is determined using formula (1)
Appoint degree as shown in formula (2):
For another example:Assuming that the opposite end account communicated with account A is A2, in figure 3, the core that A2 is account A associates group
Account, that is to say, that the opposite end account A2 communicated with account A belongs to core contacts group's account of account A, meanwhile, it is false
If the reputation value R (A2) of opposite end account A2 is 1, and the account that opposite end account A2 is communicated with the account in its core contacts group
Quantity be 30, then use formula (1) to determine that the degree of belief such as formula (3) between account A and opposite end account A2 is shown:
Rule two:When the opposite end account Ap communicated with account A is not belonging to the core contacts group of account A, then, account
Degree of belief number between A and opposite end account Ap is a kind of correspondence of passiveness, negative degree of belief can be given, specifically, can
To determine the degree of belief between account A and opposite end account Ap using formula (4):
Formula (4), R (Ap) indicate the reputation value of opposite end account Ap, R (Ap)=1;H indicates opposite end account Ap and its non-core
The quantity for the account that account in heart contacts group is communicated;L indicates the number that opposite end account Ap is communicated with account A;M tables
Show the relationship of other opposite ends account Ap of opposite end account Ap and account A:The contacts of the contacts group and opposite end account Ap of m and account A
The number f of the public contacts account of group is related, and m=f+1;If only there are one belong to opposite end account to the opposite end account of account A
The contacts group of Ap, then m=2, if the opposite end account of account A there are two there are two the contacts groups for belonging to opposite end account Ap, m=
3。
Example shown in Fig. 3 is accepted again, such as:Assuming that the opposite end account communicated with account A is A7, and in figure 3, A7
For the account of the non-core contacts group of account A, that is to say, that the opposite end account A7 communicated with account A belongs to the non-of account A
Core associates group's account, simultaneously, it is assumed that the reputation value R (A7) of opposite end account A7 is 1, and opposite end account A7 and its non-core contacts
The quantity for the account that account in group is communicated is 200, and the number that opposite end account A7 is communicated with account A is 20,
The number of the contacts group of opposite end account A7 and public contacts account in the contacts group of account A is 2 (referring to the descriptions of Fig. 3), then m=
3, so, it is determined shown in the degree of belief such as formula (5) between account A and opposite end account A7 according to formula (4):
Continue to accept example shown in Fig. 3, it is assumed that the opposite end account communicated with account A is A8, and in figure 3, A8 is
The account of the non-core contacts group of account A, that is to say, that the opposite end account A8 communicated with account A belongs to the non-core of account A
The heart associates group's account, simultaneously, it is assumed that the reputation value R (A8) of opposite end account A8 is 1, and opposite end account A8 and its non-core contacts group
In the quantity of account that is communicated of account be 20 and number that opposite end account A8 is communicated with account A is 20, opposite end
Public contacts account is not present in the contacts group of account A7 and the contacts group of account A, then m=1, so, come according to formula (4)
It determines shown in the degree of belief such as formula (6) between account A and opposite end account A8:
Here it is to be noted that it there is no the account of core contacts group for those, for example, the account of new registration is still
The degree of belief communicated between account using rule two to calculate the new registration.
According to two rules above-mentioned, the reputation value of account A can be determined by formula (7):
In formula (7), R(n+1)(A) indicate that the reputation value of account A in (n+1)th iteration, Ap are communicated with account A
Opposite end account, 0≤p≤N, N be with the sum of the account A opposite end accounts communicated, τ (A, Ap) is Ap pairs of opposite end account
The degree of belief of account A is evaluated;ω indicates that damping coefficient, general value are 0.85, and damping coefficient ensures to pass through reconciliation bugle call reputation value
Carry out limited number of time iteration, R(n+1)(A) result of calculation close to the intrinsic reputation value of each account, iteration initial value R can be obtained(0)(A)
=1.
Here, those skilled in the art can also use when determining the degree of belief between account A and opposite end account Ap
The various prior arts realize which is not described herein again.Moreover, in actual application process, the reputation of all accounts is determined
Distributed computation model or cloud computing platform are generally used when value, here, still by taking cloud computing platform as an example, to illustrate this hair
When step 103 uses distribution and the pooled model of cloud computing in bright embodiment, included specific steps:
Step 501, prepare 4 files:File 1 is the communications records of account, that is, each behavior in file 1 and account A
The opposite end account communicated;File 2 is that the core of account A associates the account in group, is had recorded per a line in file 2 any one
The number of a account and its core contacts group;File 3 is the message registration on the day of account, the format and file 1 of file input
Unanimously;File 4 is the initial reputation value of record same day opposite end account, and one number of each behavior in file 4 is corresponding
The initial reputation value of reputation value, each number is 1;
Step 502, in cloud environment, using account in file 1 as the key assignments of distribution model in cloud computing, and according to described
Opposite end account is all collected into pooled model by key assignments;In addition, due to needing the l in calculation formula (4), opposite end account need to be made
For the key assignments in distribution model, the number got with account can guarantee in this way;
Step 503, in the pooled model of cloud computing, the reputation value of account is determined by above-mentioned formula (7);
Step 504, it iterates and calculates step 503, obtain reputation R when (n+1)th iteration of opposite end account(n+1)(A);
Step 505:Determine the absolute value of the difference of the last all account reputation values and preceding primary all account reputation values
δ,
δ=| | AVG (R(n+1))-AVG(R(n))||(8);In formula (8), AVG expressions are averaged, | | expression takes absolutely
Value, R(n)(A) it indicates the reputation value of all accounts, therefore is reputation value vector;The reputation average value AVG of (n+1)th all account
(R(n+1)) all accounts of n-th reputation average value AVG (R(n)) be less than specified threshold epsilon or iterations reach the upper limit, then it is defeated
The reputation value for going out each account, otherwise jumps to step 402;
Further, in above-mentioned steps 104, according to the reputation value of the reputation value of the account or the account and described
The number of account communication determines whether the account is rubbish account, and specific process is as follows:
Here, the embodiment of the present invention determines which account belongs to rubbish account using the decision-tree model of " SPRINT algorithms "
Number, and be introduced into cloud computing platform, include the following steps:
Step 601, reputation value is ranked up and determines possible cut-point;
Here, since the reputation value attribute of account is continuous variable, so first reputation value is ranked up, it can with determination
The cut-point of energy.
Specifically, reputation value can be ranked up according to sequence from small to large, such as:Determine natural sequence [1,10]
1,2,3,4,5,6,7,8,9,10 possible cut-point can take the average value between two adjacent numbers as possible segmentation
Point, it would be possible that cut-point just have 1.5,2.5,3.5,4.5,5.5,6.5,7.5,8.5 and 9.5;If select 2.5 as point
Cutpoint, then can natural sequence [1,10] be divided [1,2] and [3,10].Those skilled in the art can be according to various existing
There is technology to determine cut-point, which is not described herein again.
Step 602, in the distribution mode of cloud computing, using cut-point as key assignments, by each row data according to the key assignments point
It is dealt into corresponding merging patterns;
Here, the citing in step 501 is accepted, it is cut-point that such as above-mentioned [1,10], which takes 5.5, therefore cut-point 5.5 is made
For the key assignments of [1,10], and it will be write as a line, such as:
5.5,1,2,3,4,5,6,7,8,9,10;
Step 603, determine that it corresponds to the Geordie gini values of cut-point in each merging patterns parallel respectively;
Step 604, gini values minimum in reputation value are determined, that is, determine cut-point;
Step 605, the boundary of rubbish account and normal account is determined by the cut-point;
Step 606, the value of model output is the reputation value of account and sends the cut-point of short message, and, by the segmentation mould
Tree sort rule under formula;
Step 607, after decision-tree model has been established, the reputation result having been calculated can be input in the model into
Row identifies whether the number is the number for sending refuse messages.
Here, in step 601 to step 605, it can also determine whether the account is rubbish by the number of communication
Whether rubbish account, and be that rubbish account process is similar above by account described in reputation value, and which is not described herein again.
Fig. 4 is the composed structure schematic diagram of information filtrating device of the embodiment of the present invention, as shown in figure 4, described device includes
First determination unit 701, the second determination unit 702 and third determination unit 703, wherein
First determination unit 701, for determining that the corresponding core of the account is handed over according to the communications records of each account
Toward group and non-core contacts group;
Second determination unit 702, communications records for associating each member in group according to the core and this is non-
The communications records of each member in core contacts group determine the reputation value of the account;
The third determination unit 703, for determining whether the account is rubbish account according to the reputation value of the account.
Further, described device further includes the 4th determination unit 704 and interception unit 705, wherein
4th determination unit 704, for extracting conversion and load ETL, and/or black and white lists method by data
Determine the communications records of each account;
The interception unit 705, the information of the transmission for intercepting the rubbish account.
Preferably, second determination unit specifically determines the reputation value of the account by following formula:
In above formula, R(n+1)(A) the reputation value of account A in (n+1)th iteration is indicated;Ap is pair communicated with account A
Hold account, 0≤p≤N, N be and the sum of the account A opposite end accounts communicated;ω indicates damping coefficient;Iteration is initial
Value R(0)(A)=1;τ (A, Ap) is that opposite end account Ap evaluates the degree of belief of account A;Wherein, when pair communicated with account A
When end account Ap belongs to the core contacts group of account A, the degree of belief between account A and opposite end account Ap is determined using following formula:
In above formula, R (Ap) indicates that the reputation value of opposite end account Ap, R (Ap)=1, k indicate that opposite end account Ap is handed over its core
The quantity for the account that account into group is communicated;
Wherein, when the opposite end account Ap communicated with account A is not belonging to the core contacts group of account A, using following formula
To determine the degree of belief between account A and opposite end account Ap:
In above formula, R (Ap) indicates the reputation value of opposite end account Ap, R (Ap)=1;H indicates that opposite end account Ap is non-core with it
The quantity for the account that account in contacts group is communicated;L indicates the number that opposite end account Ap is communicated with account A;M is indicated
The relationship of other opposite ends account Ap of opposite end account Ap and account A:The contacts group of the contacts group and opposite end account Ap of m and account A
Public contacts account number f it is related, and m=f+1.
Fig. 5 is the composed structure schematic diagram of the first determination unit in Fig. 4, as shown in figure 5, first determination unit is into one
Step includes the first determination subelement 801, the second determination subelement 802 and third determination subelement 803, wherein
First determination subelement 801, for determining the opposite end account Ap communicated with account A, 0≤p≤N, N are
With the sum of the account A accounts communicated;
Second determination subelement 802 whether there is correspondence between the opposite end account for determining the account;
The third determination subelement 803, for the opposite end account with correspondence between the account of opposite end to be determined as
The core of the account A associates group;Conversely, opposite end account to be determined as to the non-core contacts group of the account A.
Fig. 6 is the composed structure schematic diagram of the second determination subelement in Fig. 5, as shown in fig. 6, second determination subelement
Further comprise setup module 901, judgment module 902 and determining module 903, wherein
The setup module 901, the communication threshold for presetting number of communications between opposite end account two-by-two;
The judgment module 902, the number for judging to be communicated between the account of arbitrary two opposite end and the communication threshold
Relationship between value;
The determining module 903, the number for being communicated between the account of arbitrary two opposite end when judging result are more than
When equal to the communication threshold, determine between the two opposite ends account mutually there is correspondence;Conversely, determining the two opposite ends account
Between be not present correspondence.
Information filtrating device in the embodiment of the present invention generally uses distributed calculating mould in actual application process
Type or cloud computing platform are realized, here, by taking cloud computing platform as an example, to illustrate information filtrating device in the embodiment of the present invention
Composed structure.Fig. 7 is the composed structure schematic diagram of information filtrating device when the embodiment of the present invention uses cloud computing platform, is such as schemed
Shown in 7, information filtrating device includes cloud platform layer 1000 and data analysis layer when the embodiment of the present invention uses cloud computing platform
1010, wherein
The cloud platform layer 1000 further comprises:First communications records unit 1001, the second communications records unit 1002,
Core associates group unit 1003, wherein
The first communications records unit 1001, the historical communication for storing all accounts records and the communication on the same day
Record;
The second communications records unit 1002, the filtered communications records for storing all accounts;
The core associates group unit 1003, the core contacts group for storing all accounts;
The data analysis layer 1010 further comprises:Communications records determination unit 1011, core associate group's determination unit
1012, reputation value determination unit 1013, account recognition unit 1014 and decision tree generation unit 1015, wherein
The communications records determination unit 1011, for being stored in each account in the first communications records unit 1001
Number communications records be filtered, and the communications records of filtered each account are output to second communications records
In unit 1002;
The core contacts group determination unit 1012, is stored in for basis in the second communications records unit 1002
The communications records of each account determine the corresponding core contacts group of the account, and the core of the account is associated to the determination of group
As a result it is output in the core contacts group unit 1003;
The reputation value determination unit 1013, the reputation value for associating each member in group according to the core, and should
The reputation value of each member in non-core contacts group determines the reputation value of the account, and the reputation value is output to the rubbish
In account recognition unit 1014;
Here, the reputation value for executing the account parallel using the distribution in cloud platform-joint account model determines task.
During iterating to calculate account reputation value, each iteration is loaded into the result of last round of determination reputation value.
In merging phase, merge the result of calculation that all dispersion results obtain the R of a new round.The R result of calculations of each round
Fixed position can be placed on, when two-wheeled result of calculation only it is poor be less than some threshold value or iteration and reach specified number when, then
It is considered that R has reached stationary value, final number reputation value is obtained.
The account recognition unit 1014, for determining whether the account is rubbish account according to the reputation value of the account,
And the recognition result is output to the decision tree generation unit 1015.
The decision tree generation unit 1015, for being trained certainly according to the recognition result of the account recognition unit 1014
Plan tree;
Further, the data analysis layer 1010 in the embodiment of the present invention can also include that the junk information intercepts list
Member, the information of the transmission for intercepting the rubbish account.
If the above-mentioned integrated unit of the present invention is realized in the form of software function module and is sold as independent product
Or it in use, can also be stored in a computer read/write memory medium.Based on this understanding, the embodiment of the present invention
Substantially the part that contributes to existing technology can be expressed in the form of software products technical solution in other words, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be
People's computer, server or network equipment etc.) execute all or part of each embodiment the method for the present invention.And it is preceding
The storage medium stated includes:Movable storage device, read-only memory (ROM, Read-Only Memory), random access memory
The various media that can store program code such as (RAM, Random Access Memory), magnetic disc or CD.
The foregoing is only a preferred embodiment of the present invention, is not intended to limit the scope of the present invention.
Claims (8)
1. a kind of information filtering method, which is characterized in that the method includes:
The corresponding core contacts group of the account and non-core contacts group are determined according to the communications records of each account;
The communications records and the non-core communication note for associating each member in group of each member in group are associated according to the core
Record determines the reputation value R of the account(n+1)(A), including:
Wherein, R(n+1)(A) the reputation value of account A in (n+1)th iteration is indicated;Ap is the opposite end account communicated with account A,
0≤p≤N, N are and the sum of the account A opposite end accounts communicated;ω indicates damping coefficient;Iteration initial value R(0)
(A)=1;τ (A, Ap) is that opposite end account Ap evaluates the degree of belief of account A;
When the opposite end account Ap communicated with account A belongs to the core contacts group of account A, account A is determined using following formula
With the degree of belief between the account Ap of opposite end:
Wherein, R (Ap) indicates that the reputation value of opposite end account Ap, R (Ap)=1, k indicate that opposite end account Ap is associated with its core in group
The quantity of account that is communicated of account;
When the opposite end account Ap communicated with account A is not belonging to the core contacts group of account A, account is determined using following formula
Degree of belief between A and opposite end account Ap:
Wherein, R (Ap) indicates the reputation value of opposite end account Ap, R (Ap)=1;H indicates opposite end account Ap and its non-core contacts group
In the quantity of account that is communicated of account;L indicates the number that opposite end account Ap is communicated with account A;M indicates opposite end account
The relationship of other opposite ends account Ap of number Ap and account A:The contacts group's for associating group and opposite end account Ap of m and account A is public
The number f for associating account is related, and m=f+1;
Determine whether the account is rubbish account according to the reputation value of the account;
When the account is rubbish account, the information that the account is sent is intercepted.
2. according to the method described in claim 1, it is characterized in that, the communications records according to each account determine the account
Corresponding core contacts group and non-core contacts group, including:
For each account A:
Determine the opposite end account Ap communicated with account A, wherein 0≤p≤N, N are the account communicated with the account A
Sum;
It determines and whether there is correspondence between the opposite end account of the account A;
Opposite end account with correspondence between the account of opposite end is determined as to the core contacts group of the account A;Conversely, by right
End account is determined as the non-core contacts group of the account A.
3. according to the method described in claim 2, it is characterized in that, between the opposite end account of the determination account A whether
There are correspondences, including:
Judge the relationship between the number communicated between the account of arbitrary two opposite end and default communication threshold;
When the number that judging result is communicated between the account of arbitrary two opposite end is more than or equal to the communication threshold, determining should
Mutually there is correspondence between two opposite end accounts;Conversely, correspondence is not present between determining the two opposite ends account.
4. according to the method described in claim 2, it is characterized in that, determining the account pair according to the communications records of each account
Before the core contacts group answered, the method further includes:
Conversion and load ETL are extracted by data, and/or black and white lists method determines the communications records of each account.
5. a kind of information filtrating device, which is characterized in that described device includes the first determination unit, the second determination unit, third
Determination unit and interception unit, wherein
First determination unit, for determining the corresponding core contacts group of the account and non-according to the communications records of each account
Core associates group;
Second determination unit, the communications records for associating each member in group according to the core and the non-core friendship
The communications records of each member into group determine the reputation value R of the account(n+1)(A), including:
Wherein, R(n+1)(A) the reputation value of account A in (n+1)th iteration is indicated;Ap is the opposite end account communicated with account A,
0≤p≤N, N are and the sum of the account A opposite end accounts communicated;ω indicates damping coefficient;Iteration initial value R(0)
(A)=1;τ (A, Ap) is that opposite end account Ap evaluates the degree of belief of account A;
When the opposite end account Ap communicated with account A belongs to the core contacts group of account A, account A is determined using following formula
With the degree of belief between the account Ap of opposite end:
Wherein, R (Ap) indicates that the reputation value of opposite end account Ap, R (Ap)=1, k indicate that opposite end account Ap is associated with its core in group
The quantity of account that is communicated of account;
When the opposite end account Ap communicated with account A is not belonging to the core contacts group of account A, account is determined using following formula
Degree of belief between A and opposite end account Ap:
Wherein, R (Ap) indicates the reputation value of opposite end account Ap, R (Ap)=1;H indicates opposite end account Ap and its non-core contacts group
In the quantity of account that is communicated of account;L indicates the number that opposite end account Ap is communicated with account A;M indicates opposite end account
The relationship of other opposite ends account Ap of number Ap and account A:The contacts group's for associating group and opposite end account Ap of m and account A is public
The number f for associating account is related, and m=f+1;
The third determination unit, for determining whether the account is rubbish account according to the reputation value of the account;
The interception unit, the information sent for intercepting the rubbish account.
6. device according to claim 5, which is characterized in that first determination unit further comprises the first determination
Unit, the second determination subelement and third determination subelement, wherein
First determination subelement, for determining the opposite end account Ap that is communicated with account A, wherein 0≤p≤N, N for
The sum for the account that the account A is communicated;
Second determination subelement whether there is correspondence between the opposite end account for determining the account;
The third determination subelement, for the opposite end account with correspondence between the account of opposite end to be determined as the account
The core of A associates group;Conversely, opposite end account to be determined as to the non-core contacts group of the account A.
7. device according to claim 6, which is characterized in that second determination subelement further comprises that mould is arranged
Block, judgment module and determining module, wherein
The setup module, the communication threshold for presetting number of communications between opposite end account two-by-two;
The judgment module, between the number and the communication threshold for judging to be communicated between the account of arbitrary two opposite end
Relationship;
The determining module, the number for being communicated between the account of arbitrary two opposite end when judging result are more than or equal to described
When communication threshold, determine between the two opposite ends account mutually there is correspondence;Conversely, not deposited between determining the two opposite ends account
In correspondence.
8. according to claim 5 to 7 any one of them device, which is characterized in that described device further comprises the 4th determination
Unit, wherein
4th determination unit, it is each for being determined by data extraction conversion and load ETL, and/or black and white lists method
The communications records of account.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310403218.7A CN104427503B (en) | 2013-09-06 | 2013-09-06 | A kind of information filtering method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310403218.7A CN104427503B (en) | 2013-09-06 | 2013-09-06 | A kind of information filtering method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104427503A CN104427503A (en) | 2015-03-18 |
CN104427503B true CN104427503B (en) | 2018-09-07 |
Family
ID=52975204
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310403218.7A Active CN104427503B (en) | 2013-09-06 | 2013-09-06 | A kind of information filtering method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104427503B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106304084B (en) * | 2016-08-15 | 2019-10-29 | 成都九鼎瑞信科技股份有限公司 | Information processing method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101557441B (en) * | 2009-05-12 | 2011-11-30 | 成都市华为赛门铁克科技有限公司 | Method and device for call filtering |
CN102547712A (en) * | 2011-12-09 | 2012-07-04 | 成都市华为赛门铁克科技有限公司 | Method and equipment for detecting junk incoming call |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102857921B (en) * | 2011-06-30 | 2016-03-30 | 国际商业机器公司 | Judge method and the device of spammer |
-
2013
- 2013-09-06 CN CN201310403218.7A patent/CN104427503B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101557441B (en) * | 2009-05-12 | 2011-11-30 | 成都市华为赛门铁克科技有限公司 | Method and device for call filtering |
CN102547712A (en) * | 2011-12-09 | 2012-07-04 | 成都市华为赛门铁克科技有限公司 | Method and equipment for detecting junk incoming call |
Also Published As
Publication number | Publication date |
---|---|
CN104427503A (en) | 2015-03-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105931123B (en) | Friend recommendation method and device based on network account | |
CN106599317B (en) | Test data processing method, device and the terminal of question answering system | |
CN102024045B (en) | Information classification processing method, device and terminal | |
CN107526807A (en) | Information recommendation method and device | |
CN103593799B (en) | Natural person's information setting method, system and corresponding friend recommendation method, system | |
CN104717124A (en) | Friend recommendation method, device and server | |
CN104573304A (en) | User property state assessment method based on information entropy and cluster grouping | |
CN103501374A (en) | Telephone book sequencing method and device as well as terminal | |
CN105022754A (en) | Social network based object classification method and apparatus | |
CN107992513A (en) | A kind of information processing system and its method for realizing information processing | |
CN107274042A (en) | A kind of business participates in the Risk Identification Method and device of object | |
CN107368499A (en) | A kind of client's tag modeling and recommendation method and device | |
CN104427503B (en) | A kind of information filtering method and device | |
CN103778223B (en) | Pervasive word-reciting system based on cloud platform and construction method thereof | |
CN107766075A (en) | The processing method and processing device that code merges | |
CN109033224A (en) | A kind of Risk Text recognition methods and device | |
CN106559556A (en) | A kind of communication processing method, device | |
CN105022821B (en) | Content filtering method and terminal | |
Ezpeleta et al. | Short messages spam filtering using personality recognition | |
KR101568800B1 (en) | Real-time issue search word sorting method and system | |
CN109325175A (en) | Merge the news push method, device and equipment of microblogging interest digging | |
CN110955778A (en) | Junk short message identification method and system based on differential privacy joint learning | |
CN109582829A (en) | A kind of processing method, device, equipment and readable storage medium storing program for executing | |
CN104111926A (en) | Generation method and generation device for attention recommending list of address book | |
CN115760453A (en) | Method and device for creating accounting archive data association relation and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |