CN104427503B - A kind of information filtering method and device - Google Patents

A kind of information filtering method and device Download PDF

Info

Publication number
CN104427503B
CN104427503B CN201310403218.7A CN201310403218A CN104427503B CN 104427503 B CN104427503 B CN 104427503B CN 201310403218 A CN201310403218 A CN 201310403218A CN 104427503 B CN104427503 B CN 104427503B
Authority
CN
China
Prior art keywords
account
opposite end
communicated
group
core
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310403218.7A
Other languages
Chinese (zh)
Other versions
CN104427503A (en
Inventor
祝希路
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Group Hunan Co Ltd
Original Assignee
China Mobile Group Hunan Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Group Hunan Co Ltd filed Critical China Mobile Group Hunan Co Ltd
Priority to CN201310403218.7A priority Critical patent/CN104427503B/en
Publication of CN104427503A publication Critical patent/CN104427503A/en
Application granted granted Critical
Publication of CN104427503B publication Critical patent/CN104427503B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/12Detection or prevention of fraud
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/12Messaging; Mailboxes; Announcements

Abstract

The invention discloses a kind of information filtering method, the method includes:The corresponding core contacts group of the account and non-core contacts group are determined according to the communications records of each account;The reputation value of the account is determined according to communications records of each member in the communications records of each member in core contacts group and the non-core contacts group;Determine whether the account is rubbish account according to the reputation value of the account.The present invention also discloses a kind of information filtrating devices.Technical solution using the present invention is intercepted by the information sent to the rubbish account, to ensure that the recall ratio and precision ratio of junk information.

Description

A kind of information filtering method and device
Technical field
The present invention relates to Information Filtering Technology more particularly to a kind of information filtering methods and device.
Background technology
Spreading unchecked for the information such as refuse messages, advertisement is a significant problem for perplexing user and mobile communication carrier, because This is necessary to be identified and filters to information.The widely used information filtering means of operator can be divided mainly into user at present The filtering of the filtering and infobit rank of rank, process object correspond to the letter that single information sends user and single respectively Breath.
The filter method of user class includes mainly:Filter method based on frequency threshold and black and white lists method, wherein Filter method based on frequency threshold is setpoint frequency threshold value during information intercepting, the setting of frequency threshold rely primarily on through It tests, therefore, subjective factor can cause a large amount of erroneous judgements of legal short message and failing to judge for information, not be apparent so as to cause effect. Black and white lists method overcomes the defect of the filter method based on frequency threshold, but black and white lists need in black and white lists method It safeguards by hand, it is time-consuming and laborious, and also during use, the quantity of black and white lists can be more and more huger, to influence short message The promptness of transmission.
The filter method of infobit rank includes mainly keyword filtration method, and keyword filtration method is not firstly the need of Dictionary is updated disconnectedly, but since keyword selection is difficult, it cannot be guaranteed that filtering out all rubbish keywords;Moreover, passing through Keyword match is difficult the content legality for judging short message, it is easy to cause to judge by accident, and by using phonetic, apply mechanically it is wrong not Word, phonetically similar word and in the information between plus the methods of many symbols can easily bypass Keyword List.Therefore, it is urgent to provide The recall ratio and precision ratio of a kind of junk information high information filtering method and device.
Invention content
In view of this, the main purpose of the present invention is to provide a kind of information filtering method and device, by rubbish account Number send information intercepted, to ensure that the recall ratio and precision ratio of junk information.
In order to achieve the above objectives, the technical proposal of the invention is realized in this way:
A kind of information filtering method provided by the invention, the method includes:It is determined according to the communications records of each account The corresponding core contacts group of the account and non-core contacts group;The communications records that each member in group is associated according to the core, with And the communications records of each member in the non-core contacts group determine the reputation value of the account;It is determined according to the reputation value of the account Whether the account is rubbish account.
In said program, the communications records according to each account determine the corresponding core contacts group of the account and non-core The heart associates group, including:For each account A:Determine the opposite end account Ap communicated with account A, wherein 0≤p≤N, N are With the sum of the account A accounts communicated;It determines and whether there is correspondence between the opposite end account of the account A; Opposite end account with correspondence between the account of opposite end is determined as to the core contacts group of the account A;Conversely, by opposite end account Number it is determined as the non-core contacts group of the account A.
In said program, it whether there is correspondence between the opposite end account of the determination account A, including:Judge Relationship between the number and default communication threshold that are communicated between the account of arbitrary two opposite end;When judging result is arbitrary two pairs When the number communicated between the account of end is more than or equal to the communication threshold, mutually presence is logical between determining the two opposite ends account Gateway system;Conversely, correspondence is not present between determining the two opposite ends account.
In said program, before determining the corresponding core contacts group of the account according to the communications records of each account, institute The method of stating further includes:Conversion and load ETL are extracted by data, and/or black and white lists method determines the communication note of each account Record.
In said program, the communications records of each member in group are associated according to the core, and should The communications records of each member in non-core contacts group determine the reputation value R of the account(n+1)(A), including:Wherein, R(n+1)(A) the reputation value of account A in (n+1)th iteration is indicated;Ap is The opposite end account communicated with account A, 0≤p≤N, N are and the sum of the account A opposite end accounts communicated;ω tables Show damping coefficient;Iteration initial value R(0)(A)=1;τ (A, Ap) is that opposite end account Ap evaluates the degree of belief of account A;
When the opposite end account Ap communicated with account A belongs to the core contacts group of account A, account is determined using following formula Degree of belief number between A and opposite end account Ap:Wherein, R (Ap) indicates the reputation value of opposite end account Ap, R (Ap)=1, k indicates the quantity for the account that opposite end account Ap is communicated with the account in its core contacts group;When with account A When the opposite end account Ap communicated is not belonging to the core contacts group of account A, account A and opposite end account Ap are determined using following formula Between degree of belief:Wherein, R (Ap) indicates the reputation value of opposite end account Ap, R (Ap)=1;H tables Show the quantity for the account that opposite end account Ap is communicated with the account in its non-core contacts group;L indicates opposite end account Ap and account The number that number A is communicated;M indicates the relationship of other opposite ends account Ap of opposite end account Ap and account A:The contacts of m and account A Group is related with the contacts public number f of contacts account of group of opposite end account Ap, and m=f+1.
In said program, the method further includes:Intercept the information of the transmission of the rubbish account.
The present invention also provides a kind of information filtrating device, described device includes the first determination unit, the second determination unit With third determination unit, wherein first determination unit, for determining that the account corresponds to according to the communications records of each account Core contacts group and non-core contacts group;Second determination unit, for associating each member's in group according to the core Communications records and the communications records of each member in the non-core contacts group determine the reputation value of the account;The third is true Order member, for determining whether the account is rubbish account according to the reputation value of the account.
In said program, first determination unit further comprise the first determination subelement, the second determination subelement and Third determination subelement, wherein first determination subelement, for determining the opposite end account Ap communicated with account A, In, 0≤p≤N, N are and the sum of the account A accounts communicated;Second determination subelement, described in determining It whether there is correspondence between the opposite end account of account;The third determination subelement, for will have between the account of opposite end The opposite end account of correspondence is determined as the core contacts group of the account A;Conversely, opposite end account is determined as the account A Non-core contacts group.
In said program, second determination subelement further comprises setup module, judgment module and determining module, In, the setup module, the communication threshold for presetting number of communications between opposite end account two-by-two;The judgment module, The relationship between number and the communication threshold for judging to be communicated between the account of arbitrary two opposite end;The determining mould Block, when the number for being communicated between the account of arbitrary two opposite end when judging result is more than or equal to the communication threshold, really Mutually there is correspondence between the fixed two opposite ends account;Conversely, correspondence is not present between determining the two opposite ends account.
In said program, described device further comprises the 4th determination unit and interception unit, wherein the described 4th determines Unit, the communications records for determining each account by data extraction conversion and load ETL, and/or black and white lists method; The interception unit, the information of the transmission for intercepting the rubbish account.
Information filtering method and device provided by the invention determine that the account is corresponding according to the communications records of each account Core associates group and non-core contacts group;Communications records and the non-core friendship of each member in group are associated according to the core The communications records of each member into group determine the reputation value of the account;According to the reputation value of the account determine the account whether be Rubbish account;It is intercepted by the information sent to the rubbish account, to ensure that the recall ratio of junk information and look into Quasi- rate;
Further, the present invention determines the member of the corresponding core contacts group of the account by the communications records of account, Each account in core contacts group determines by least there are communications records between each other there are three account, so as to avoid Due to the deviation that conventional method is arranged due to frequency threshold, and then it is incorrect to cause core contacts mass selection to be selected;
Further, the present invention will determine that core contacts group and reputation value are introduced into cloud computing platform, can save in this way Storage resource and computing resource make it realize distributed parallel in cloud platform by the distribution joint account model of cloud computing It calculates, to ensure that efficient processing.
Description of the drawings
Fig. 1 is the implementation process schematic diagram of information filtering method of the embodiment of the present invention;
Fig. 2 is the specific implementation flow schematic diagram of step 102 in Fig. 1;
Fig. 3 is the structural schematic diagram that group is associated in the embodiment of the present invention;
Fig. 4 is the composed structure schematic diagram of information filtrating device of the embodiment of the present invention;
Fig. 5 is the composed structure schematic diagram of the first determination unit in Fig. 4;
Fig. 6 is the composed structure schematic diagram of the second determination subelement in Fig. 5;
Fig. 7 is the composed structure schematic diagram of information filtrating device when the embodiment of the present invention uses cloud computing platform.
Specific implementation mode
The present invention basic thought be:First the corresponding core contacts group of the account is determined according to the communications records of each account With non-core contacts group;According to each in the communications records of each member in core contacts group and the non-core contacts group The communications records of member determine the reputation value of the account;Determine whether the account is rubbish account according to the reputation value of the account; It is intercepted by the information sent to the rubbish account, to ensure that the recall ratio and precision ratio of junk information.
The technical solution of the present invention is further elaborated in the following with reference to the drawings and specific embodiments.
Fig. 1 is the implementation process schematic diagram of information filtering method of the embodiment of the present invention, as shown in Figure 1, the embodiment of the present invention Information filtering method includes:
Step 101:Determine the communications records of each account;
Here, the communications records include:Each opposite end account of each account, time communicated with each opposite end account Number and at the beginning of communicating each time, end time and duration etc..
Here, the communications records of each account of the determination mainly delete some unnecessary records.Such as:This hair Communications records in bright embodiment are recorded if short/color, are sent in those SMS/MMSs record then step 101 can delete The message etc. that the record or even user that the record and business system of oneself are sent are subscribed to;For another example, in the embodiment of the present invention Communications records if telephony recording, then step 101 can delete those and the message registration of operator and user oneself Another account and the account between message registration etc..
Here, the method for the communications records of each account of the determination, including:Data extraction conversion and load (ETL, Extraction Transformation Loading) method, and/or black and white lists method.Those skilled in the art can be with The communications records of the account are filtered according to various existing ETL methods and black and white lists method, which is not described herein again.
Step 102:The corresponding core contacts group of the account and non-core contacts are determined according to the communications records of each account Group;
Step 103:According to each in the communications records of each member in core contacts group and the non-core contacts group The communications records of member determine the reputation value of the account;
Step 104:Determine whether the account is rubbish account according to the reputation value of the account;
Here, the reputation value according to the account determine the account whether be rubbish account realization process, this field Technical staff decision tree may be used to realize.
Step 105:Intercept the information of the transmission of the rubbish account.
Further, Fig. 2 is the specific implementation flow schematic diagram of step 102 in Fig. 1, the communication according to each account Record determines the corresponding core contacts group of the account and non-core contacts group, including:
Step 201, the opposite end account Ap communicated with account A is determined;
Here, the opposite end account Ap includes the account A by the account breathed out when caller account and the account For A to be called the account of account when institute incoming call, the account Ap forms the contacts group of the account A;P is more than or equal to 1 and to be less than Natural number equal to N, 1≤p≤N, N are and the sum of the account A opposite end accounts communicated;
Step 202, it determines and whether there is correspondence between the opposite end account of the account A;
Here, in order to more determine the correspondence between the account of arbitrary two opposite end, opposite end account two-by-two can be preset The communication threshold of number of communications is carried out between number;Then judge the number communicated between the account of opposite end two-by-two with it is described logical Believe the relationship between threshold value;When the number that judging result is communicated between this two-by-two opposite end account is more than or equal to the communication When threshold value, determine between the two opposite ends account mutually there is correspondence;Conversely, between determining the two opposite ends account, there is no logical Gateway system.
Step 203, the opposite end account with correspondence between the account of opposite end is determined as to the core contacts of the account A Group;Conversely, opposite end account to be determined as to the non-core contacts group of the account A.
Fig. 3 is the structural schematic diagram that group is associated in the embodiment of the present invention, as shown in figure 3, the account A and its opposite end account Correspondence between Ap indicates with solid line, and the opposite end account one of the account A shares 8, respectively opposite end account A1, A2, A3, A4, A5, A6, A7, A8, two opposite end accounts being in communication with each other in 1≤p≤8, Fig. 3 are also indicated with solid line, wherein Between opposite end account A1 and opposite end account A2, between opposite end account A3 and opposite end account A4, opposite end account A5 and opposite end account A6 Between and opposite end account A3 and opposite end account A6 between all there is correspondence, correspondence described here refers to right two-by-two The number communicated between the account of end is more than or equal to set communication threshold;Accordingly, it is determined that opposite end account A1, A2, A3, A4, A5, A6 are the account of the core contacts group of account A;And opposite end account A7 and A8 are only communicated with account A, and and its It is not communicated between his opposite end account, alternatively, time communicated between opposite end account A7 and A8 and other opposite end accounts Number is less than set communication threshold;Therefore, opposite end account A7 and A8 cannot constitute the core of account A and associate group, and constitute account The non-core contacts group of number A.
Such as:Assuming that the contacts group of opposite end account A7 distinguishes A8 and A1 with public contacts account in the contacts group of account A, The number of communications between number of communications and A7 and A1 between middle A7 and A8 is both less than set communication threshold, it is seen then that though There are 2 public contacts accounts in the right contacts group of opposite end account A7 and the contacts group of account A, but opposite end account A7 and is not belonging to account The core of number A associates group.The communication information in embodiment shown in Fig. 3 is the business datum of interactive, the industry of this kind of interactive Business data ensure that contacts group and the credibility of core contacts group of account.
In actual application process, the account is generally determined using distributed computation model or cloud computing platform Number corresponding core associates group, here, by taking cloud computing platform as an example, to illustrate that step 102 uses cloud meter in the embodiment of the present invention When the distribution of calculation and pooled model, the communications records according to each account determine the corresponding core contacts group institute of the account Include the steps that.
Step 401, using account as key assignments, the communications records of each account are write as data line;
Specifically, by account A communications records according to<Account A, opposite end account A1 ... Ap ..., AN>Format write-in text Part becomes the data line in the file, wherein 1≤p≤N, N are the sum of the opposite end account communicated with account A; And
By the communications records of the opposite end account Ap of account A also according to<Account Ap, opposite end account Ap1 ... Aps ..., Apt> Format file is written, become the data line in the file, wherein 1≤s≤T, T be with the opposite end account Ap of account A into The sum of the opposite end account of row communication;
Step 402, in the distribution model of cloud computing, using account as key assignments, by the opposite end account of the account and The opposite-terminal number of opposite end number is distributed in pooled model;
Such as:In the distribution model of cloud computing, when account is A, the opposite end account Ap of the account A is distributed to conjunction And in model, wherein 1≤p≤N, N are and the sum of the account A opposite end accounts communicated;
For another example:In the distribution model of cloud computing, when account is Ap, the opposite end account Aps of the account Ap is distributed Into pooled model, wherein 1≤s≤T, T are and the sum of the account Ap opposite end accounts communicated;
Step 403, each opposite end account Ap of account A is compared with the opposite end account Aps of each opposite end account respectively, In, 1≤p≤N, 1≤s≤T, N are the sum of the opposite end account communicated with account A, and T is to be carried out with the opposite end account Ap The sum of the opposite end account of communication;If Aps is identical as Ap, opposite end account Ap is classified as account A core contacts groups.
Preferably, in above-mentioned steps 103, the communications records of each member in group is associated according to the core and this is non- The communications records of each member in core contacts group determine the reputation value of the account, realize that process is as follows:
In Fig. 3, each solid line in the contacts group of account A all indicates that the correspondence between account, this communication close System also include the degree of belief τ (A, Ap) communicated between account A and account Ap, first describe below degree of belief τ (A, Ap rule):
Rule one:When the opposite end account Ap communicated with account A belongs to the core contacts group of account A, then, account Degree of belief between A and opposite end account Ap is a kind of positive correspondence, can give positive degree of belief, specifically, can be with Degree of belief between account A and opposite end account Ap is determined using formula (1):
In formula (1), R (Ap) indicates that the reputation value of opposite end account Ap, R (Ap)=1, k indicate opposite end account Ap and its core The quantity for the account that account in heart contacts group is communicated.
Continue to accept example shown in Fig. 3, such as:Assuming that the opposite end account communicated with account A is A1, in figure 3, A1 is that the core of account A associates the account of group, that is to say, that the opposite end account A1 communicated with account A belongs to the core of account A The heart associates group's account, simultaneously, it is assumed that the reputation value R (A1) of opposite end account A1 is 1, and opposite end account A1 is associated with its core in group The quantity of account that is communicated of account be 200, then the letter between account A and opposite end account A1 is determined using formula (1) Appoint degree as shown in formula (2):
For another example:Assuming that the opposite end account communicated with account A is A2, in figure 3, the core that A2 is account A associates group Account, that is to say, that the opposite end account A2 communicated with account A belongs to core contacts group's account of account A, meanwhile, it is false If the reputation value R (A2) of opposite end account A2 is 1, and the account that opposite end account A2 is communicated with the account in its core contacts group Quantity be 30, then use formula (1) to determine that the degree of belief such as formula (3) between account A and opposite end account A2 is shown:
Rule two:When the opposite end account Ap communicated with account A is not belonging to the core contacts group of account A, then, account Degree of belief number between A and opposite end account Ap is a kind of correspondence of passiveness, negative degree of belief can be given, specifically, can To determine the degree of belief between account A and opposite end account Ap using formula (4):
Formula (4), R (Ap) indicate the reputation value of opposite end account Ap, R (Ap)=1;H indicates opposite end account Ap and its non-core The quantity for the account that account in heart contacts group is communicated;L indicates the number that opposite end account Ap is communicated with account A;M tables Show the relationship of other opposite ends account Ap of opposite end account Ap and account A:The contacts of the contacts group and opposite end account Ap of m and account A The number f of the public contacts account of group is related, and m=f+1;If only there are one belong to opposite end account to the opposite end account of account A The contacts group of Ap, then m=2, if the opposite end account of account A there are two there are two the contacts groups for belonging to opposite end account Ap, m= 3。
Example shown in Fig. 3 is accepted again, such as:Assuming that the opposite end account communicated with account A is A7, and in figure 3, A7 For the account of the non-core contacts group of account A, that is to say, that the opposite end account A7 communicated with account A belongs to the non-of account A Core associates group's account, simultaneously, it is assumed that the reputation value R (A7) of opposite end account A7 is 1, and opposite end account A7 and its non-core contacts The quantity for the account that account in group is communicated is 200, and the number that opposite end account A7 is communicated with account A is 20, The number of the contacts group of opposite end account A7 and public contacts account in the contacts group of account A is 2 (referring to the descriptions of Fig. 3), then m= 3, so, it is determined shown in the degree of belief such as formula (5) between account A and opposite end account A7 according to formula (4):
Continue to accept example shown in Fig. 3, it is assumed that the opposite end account communicated with account A is A8, and in figure 3, A8 is The account of the non-core contacts group of account A, that is to say, that the opposite end account A8 communicated with account A belongs to the non-core of account A The heart associates group's account, simultaneously, it is assumed that the reputation value R (A8) of opposite end account A8 is 1, and opposite end account A8 and its non-core contacts group In the quantity of account that is communicated of account be 20 and number that opposite end account A8 is communicated with account A is 20, opposite end Public contacts account is not present in the contacts group of account A7 and the contacts group of account A, then m=1, so, come according to formula (4) It determines shown in the degree of belief such as formula (6) between account A and opposite end account A8:
Here it is to be noted that it there is no the account of core contacts group for those, for example, the account of new registration is still The degree of belief communicated between account using rule two to calculate the new registration.
According to two rules above-mentioned, the reputation value of account A can be determined by formula (7):
In formula (7), R(n+1)(A) indicate that the reputation value of account A in (n+1)th iteration, Ap are communicated with account A Opposite end account, 0≤p≤N, N be with the sum of the account A opposite end accounts communicated, τ (A, Ap) is Ap pairs of opposite end account The degree of belief of account A is evaluated;ω indicates that damping coefficient, general value are 0.85, and damping coefficient ensures to pass through reconciliation bugle call reputation value Carry out limited number of time iteration, R(n+1)(A) result of calculation close to the intrinsic reputation value of each account, iteration initial value R can be obtained(0)(A) =1.
Here, those skilled in the art can also use when determining the degree of belief between account A and opposite end account Ap The various prior arts realize which is not described herein again.Moreover, in actual application process, the reputation of all accounts is determined Distributed computation model or cloud computing platform are generally used when value, here, still by taking cloud computing platform as an example, to illustrate this hair When step 103 uses distribution and the pooled model of cloud computing in bright embodiment, included specific steps:
Step 501, prepare 4 files:File 1 is the communications records of account, that is, each behavior in file 1 and account A The opposite end account communicated;File 2 is that the core of account A associates the account in group, is had recorded per a line in file 2 any one The number of a account and its core contacts group;File 3 is the message registration on the day of account, the format and file 1 of file input Unanimously;File 4 is the initial reputation value of record same day opposite end account, and one number of each behavior in file 4 is corresponding The initial reputation value of reputation value, each number is 1;
Step 502, in cloud environment, using account in file 1 as the key assignments of distribution model in cloud computing, and according to described Opposite end account is all collected into pooled model by key assignments;In addition, due to needing the l in calculation formula (4), opposite end account need to be made For the key assignments in distribution model, the number got with account can guarantee in this way;
Step 503, in the pooled model of cloud computing, the reputation value of account is determined by above-mentioned formula (7);
Step 504, it iterates and calculates step 503, obtain reputation R when (n+1)th iteration of opposite end account(n+1)(A);
Step 505:Determine the absolute value of the difference of the last all account reputation values and preceding primary all account reputation values δ,
δ=| | AVG (R(n+1))-AVG(R(n))||(8);In formula (8), AVG expressions are averaged, | | expression takes absolutely Value, R(n)(A) it indicates the reputation value of all accounts, therefore is reputation value vector;The reputation average value AVG of (n+1)th all account (R(n+1)) all accounts of n-th reputation average value AVG (R(n)) be less than specified threshold epsilon or iterations reach the upper limit, then it is defeated The reputation value for going out each account, otherwise jumps to step 402;
Further, in above-mentioned steps 104, according to the reputation value of the reputation value of the account or the account and described The number of account communication determines whether the account is rubbish account, and specific process is as follows:
Here, the embodiment of the present invention determines which account belongs to rubbish account using the decision-tree model of " SPRINT algorithms " Number, and be introduced into cloud computing platform, include the following steps:
Step 601, reputation value is ranked up and determines possible cut-point;
Here, since the reputation value attribute of account is continuous variable, so first reputation value is ranked up, it can with determination The cut-point of energy.
Specifically, reputation value can be ranked up according to sequence from small to large, such as:Determine natural sequence [1,10] 1,2,3,4,5,6,7,8,9,10 possible cut-point can take the average value between two adjacent numbers as possible segmentation Point, it would be possible that cut-point just have 1.5,2.5,3.5,4.5,5.5,6.5,7.5,8.5 and 9.5;If select 2.5 as point Cutpoint, then can natural sequence [1,10] be divided [1,2] and [3,10].Those skilled in the art can be according to various existing There is technology to determine cut-point, which is not described herein again.
Step 602, in the distribution mode of cloud computing, using cut-point as key assignments, by each row data according to the key assignments point It is dealt into corresponding merging patterns;
Here, the citing in step 501 is accepted, it is cut-point that such as above-mentioned [1,10], which takes 5.5, therefore cut-point 5.5 is made For the key assignments of [1,10], and it will be write as a line, such as:
5.5,1,2,3,4,5,6,7,8,9,10;
Step 603, determine that it corresponds to the Geordie gini values of cut-point in each merging patterns parallel respectively;
Step 604, gini values minimum in reputation value are determined, that is, determine cut-point;
Step 605, the boundary of rubbish account and normal account is determined by the cut-point;
Step 606, the value of model output is the reputation value of account and sends the cut-point of short message, and, by the segmentation mould Tree sort rule under formula;
Step 607, after decision-tree model has been established, the reputation result having been calculated can be input in the model into Row identifies whether the number is the number for sending refuse messages.
Here, in step 601 to step 605, it can also determine whether the account is rubbish by the number of communication Whether rubbish account, and be that rubbish account process is similar above by account described in reputation value, and which is not described herein again.
Fig. 4 is the composed structure schematic diagram of information filtrating device of the embodiment of the present invention, as shown in figure 4, described device includes First determination unit 701, the second determination unit 702 and third determination unit 703, wherein
First determination unit 701, for determining that the corresponding core of the account is handed over according to the communications records of each account Toward group and non-core contacts group;
Second determination unit 702, communications records for associating each member in group according to the core and this is non- The communications records of each member in core contacts group determine the reputation value of the account;
The third determination unit 703, for determining whether the account is rubbish account according to the reputation value of the account.
Further, described device further includes the 4th determination unit 704 and interception unit 705, wherein
4th determination unit 704, for extracting conversion and load ETL, and/or black and white lists method by data Determine the communications records of each account;
The interception unit 705, the information of the transmission for intercepting the rubbish account.
Preferably, second determination unit specifically determines the reputation value of the account by following formula:
In above formula, R(n+1)(A) the reputation value of account A in (n+1)th iteration is indicated;Ap is pair communicated with account A Hold account, 0≤p≤N, N be and the sum of the account A opposite end accounts communicated;ω indicates damping coefficient;Iteration is initial Value R(0)(A)=1;τ (A, Ap) is that opposite end account Ap evaluates the degree of belief of account A;Wherein, when pair communicated with account A When end account Ap belongs to the core contacts group of account A, the degree of belief between account A and opposite end account Ap is determined using following formula:
In above formula, R (Ap) indicates that the reputation value of opposite end account Ap, R (Ap)=1, k indicate that opposite end account Ap is handed over its core The quantity for the account that account into group is communicated;
Wherein, when the opposite end account Ap communicated with account A is not belonging to the core contacts group of account A, using following formula To determine the degree of belief between account A and opposite end account Ap:
In above formula, R (Ap) indicates the reputation value of opposite end account Ap, R (Ap)=1;H indicates that opposite end account Ap is non-core with it The quantity for the account that account in contacts group is communicated;L indicates the number that opposite end account Ap is communicated with account A;M is indicated The relationship of other opposite ends account Ap of opposite end account Ap and account A:The contacts group of the contacts group and opposite end account Ap of m and account A Public contacts account number f it is related, and m=f+1.
Fig. 5 is the composed structure schematic diagram of the first determination unit in Fig. 4, as shown in figure 5, first determination unit is into one Step includes the first determination subelement 801, the second determination subelement 802 and third determination subelement 803, wherein
First determination subelement 801, for determining the opposite end account Ap communicated with account A, 0≤p≤N, N are With the sum of the account A accounts communicated;
Second determination subelement 802 whether there is correspondence between the opposite end account for determining the account;
The third determination subelement 803, for the opposite end account with correspondence between the account of opposite end to be determined as The core of the account A associates group;Conversely, opposite end account to be determined as to the non-core contacts group of the account A.
Fig. 6 is the composed structure schematic diagram of the second determination subelement in Fig. 5, as shown in fig. 6, second determination subelement Further comprise setup module 901, judgment module 902 and determining module 903, wherein
The setup module 901, the communication threshold for presetting number of communications between opposite end account two-by-two;
The judgment module 902, the number for judging to be communicated between the account of arbitrary two opposite end and the communication threshold Relationship between value;
The determining module 903, the number for being communicated between the account of arbitrary two opposite end when judging result are more than When equal to the communication threshold, determine between the two opposite ends account mutually there is correspondence;Conversely, determining the two opposite ends account Between be not present correspondence.
Information filtrating device in the embodiment of the present invention generally uses distributed calculating mould in actual application process Type or cloud computing platform are realized, here, by taking cloud computing platform as an example, to illustrate information filtrating device in the embodiment of the present invention Composed structure.Fig. 7 is the composed structure schematic diagram of information filtrating device when the embodiment of the present invention uses cloud computing platform, is such as schemed Shown in 7, information filtrating device includes cloud platform layer 1000 and data analysis layer when the embodiment of the present invention uses cloud computing platform 1010, wherein
The cloud platform layer 1000 further comprises:First communications records unit 1001, the second communications records unit 1002, Core associates group unit 1003, wherein
The first communications records unit 1001, the historical communication for storing all accounts records and the communication on the same day Record;
The second communications records unit 1002, the filtered communications records for storing all accounts;
The core associates group unit 1003, the core contacts group for storing all accounts;
The data analysis layer 1010 further comprises:Communications records determination unit 1011, core associate group's determination unit 1012, reputation value determination unit 1013, account recognition unit 1014 and decision tree generation unit 1015, wherein
The communications records determination unit 1011, for being stored in each account in the first communications records unit 1001 Number communications records be filtered, and the communications records of filtered each account are output to second communications records In unit 1002;
The core contacts group determination unit 1012, is stored in for basis in the second communications records unit 1002 The communications records of each account determine the corresponding core contacts group of the account, and the core of the account is associated to the determination of group As a result it is output in the core contacts group unit 1003;
The reputation value determination unit 1013, the reputation value for associating each member in group according to the core, and should The reputation value of each member in non-core contacts group determines the reputation value of the account, and the reputation value is output to the rubbish In account recognition unit 1014;
Here, the reputation value for executing the account parallel using the distribution in cloud platform-joint account model determines task. During iterating to calculate account reputation value, each iteration is loaded into the result of last round of determination reputation value.
In merging phase, merge the result of calculation that all dispersion results obtain the R of a new round.The R result of calculations of each round Fixed position can be placed on, when two-wheeled result of calculation only it is poor be less than some threshold value or iteration and reach specified number when, then It is considered that R has reached stationary value, final number reputation value is obtained.
The account recognition unit 1014, for determining whether the account is rubbish account according to the reputation value of the account, And the recognition result is output to the decision tree generation unit 1015.
The decision tree generation unit 1015, for being trained certainly according to the recognition result of the account recognition unit 1014 Plan tree;
Further, the data analysis layer 1010 in the embodiment of the present invention can also include that the junk information intercepts list Member, the information of the transmission for intercepting the rubbish account.
If the above-mentioned integrated unit of the present invention is realized in the form of software function module and is sold as independent product Or it in use, can also be stored in a computer read/write memory medium.Based on this understanding, the embodiment of the present invention Substantially the part that contributes to existing technology can be expressed in the form of software products technical solution in other words, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be People's computer, server or network equipment etc.) execute all or part of each embodiment the method for the present invention.And it is preceding The storage medium stated includes:Movable storage device, read-only memory (ROM, Read-Only Memory), random access memory The various media that can store program code such as (RAM, Random Access Memory), magnetic disc or CD.
The foregoing is only a preferred embodiment of the present invention, is not intended to limit the scope of the present invention.

Claims (8)

1. a kind of information filtering method, which is characterized in that the method includes:
The corresponding core contacts group of the account and non-core contacts group are determined according to the communications records of each account;
The communications records and the non-core communication note for associating each member in group of each member in group are associated according to the core Record determines the reputation value R of the account(n+1)(A), including:
Wherein, R(n+1)(A) the reputation value of account A in (n+1)th iteration is indicated;Ap is the opposite end account communicated with account A, 0≤p≤N, N are and the sum of the account A opposite end accounts communicated;ω indicates damping coefficient;Iteration initial value R(0) (A)=1;τ (A, Ap) is that opposite end account Ap evaluates the degree of belief of account A;
When the opposite end account Ap communicated with account A belongs to the core contacts group of account A, account A is determined using following formula With the degree of belief between the account Ap of opposite end:
Wherein, R (Ap) indicates that the reputation value of opposite end account Ap, R (Ap)=1, k indicate that opposite end account Ap is associated with its core in group The quantity of account that is communicated of account;
When the opposite end account Ap communicated with account A is not belonging to the core contacts group of account A, account is determined using following formula Degree of belief between A and opposite end account Ap:
Wherein, R (Ap) indicates the reputation value of opposite end account Ap, R (Ap)=1;H indicates opposite end account Ap and its non-core contacts group In the quantity of account that is communicated of account;L indicates the number that opposite end account Ap is communicated with account A;M indicates opposite end account The relationship of other opposite ends account Ap of number Ap and account A:The contacts group's for associating group and opposite end account Ap of m and account A is public The number f for associating account is related, and m=f+1;
Determine whether the account is rubbish account according to the reputation value of the account;
When the account is rubbish account, the information that the account is sent is intercepted.
2. according to the method described in claim 1, it is characterized in that, the communications records according to each account determine the account Corresponding core contacts group and non-core contacts group, including:
For each account A:
Determine the opposite end account Ap communicated with account A, wherein 0≤p≤N, N are the account communicated with the account A Sum;
It determines and whether there is correspondence between the opposite end account of the account A;
Opposite end account with correspondence between the account of opposite end is determined as to the core contacts group of the account A;Conversely, by right End account is determined as the non-core contacts group of the account A.
3. according to the method described in claim 2, it is characterized in that, between the opposite end account of the determination account A whether There are correspondences, including:
Judge the relationship between the number communicated between the account of arbitrary two opposite end and default communication threshold;
When the number that judging result is communicated between the account of arbitrary two opposite end is more than or equal to the communication threshold, determining should Mutually there is correspondence between two opposite end accounts;Conversely, correspondence is not present between determining the two opposite ends account.
4. according to the method described in claim 2, it is characterized in that, determining the account pair according to the communications records of each account Before the core contacts group answered, the method further includes:
Conversion and load ETL are extracted by data, and/or black and white lists method determines the communications records of each account.
5. a kind of information filtrating device, which is characterized in that described device includes the first determination unit, the second determination unit, third Determination unit and interception unit, wherein
First determination unit, for determining the corresponding core contacts group of the account and non-according to the communications records of each account Core associates group;
Second determination unit, the communications records for associating each member in group according to the core and the non-core friendship The communications records of each member into group determine the reputation value R of the account(n+1)(A), including:
Wherein, R(n+1)(A) the reputation value of account A in (n+1)th iteration is indicated;Ap is the opposite end account communicated with account A, 0≤p≤N, N are and the sum of the account A opposite end accounts communicated;ω indicates damping coefficient;Iteration initial value R(0) (A)=1;τ (A, Ap) is that opposite end account Ap evaluates the degree of belief of account A;
When the opposite end account Ap communicated with account A belongs to the core contacts group of account A, account A is determined using following formula With the degree of belief between the account Ap of opposite end:
Wherein, R (Ap) indicates that the reputation value of opposite end account Ap, R (Ap)=1, k indicate that opposite end account Ap is associated with its core in group The quantity of account that is communicated of account;
When the opposite end account Ap communicated with account A is not belonging to the core contacts group of account A, account is determined using following formula Degree of belief between A and opposite end account Ap:
Wherein, R (Ap) indicates the reputation value of opposite end account Ap, R (Ap)=1;H indicates opposite end account Ap and its non-core contacts group In the quantity of account that is communicated of account;L indicates the number that opposite end account Ap is communicated with account A;M indicates opposite end account The relationship of other opposite ends account Ap of number Ap and account A:The contacts group's for associating group and opposite end account Ap of m and account A is public The number f for associating account is related, and m=f+1;
The third determination unit, for determining whether the account is rubbish account according to the reputation value of the account;
The interception unit, the information sent for intercepting the rubbish account.
6. device according to claim 5, which is characterized in that first determination unit further comprises the first determination Unit, the second determination subelement and third determination subelement, wherein
First determination subelement, for determining the opposite end account Ap that is communicated with account A, wherein 0≤p≤N, N for The sum for the account that the account A is communicated;
Second determination subelement whether there is correspondence between the opposite end account for determining the account;
The third determination subelement, for the opposite end account with correspondence between the account of opposite end to be determined as the account The core of A associates group;Conversely, opposite end account to be determined as to the non-core contacts group of the account A.
7. device according to claim 6, which is characterized in that second determination subelement further comprises that mould is arranged Block, judgment module and determining module, wherein
The setup module, the communication threshold for presetting number of communications between opposite end account two-by-two;
The judgment module, between the number and the communication threshold for judging to be communicated between the account of arbitrary two opposite end Relationship;
The determining module, the number for being communicated between the account of arbitrary two opposite end when judging result are more than or equal to described When communication threshold, determine between the two opposite ends account mutually there is correspondence;Conversely, not deposited between determining the two opposite ends account In correspondence.
8. according to claim 5 to 7 any one of them device, which is characterized in that described device further comprises the 4th determination Unit, wherein
4th determination unit, it is each for being determined by data extraction conversion and load ETL, and/or black and white lists method The communications records of account.
CN201310403218.7A 2013-09-06 2013-09-06 A kind of information filtering method and device Active CN104427503B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310403218.7A CN104427503B (en) 2013-09-06 2013-09-06 A kind of information filtering method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310403218.7A CN104427503B (en) 2013-09-06 2013-09-06 A kind of information filtering method and device

Publications (2)

Publication Number Publication Date
CN104427503A CN104427503A (en) 2015-03-18
CN104427503B true CN104427503B (en) 2018-09-07

Family

ID=52975204

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310403218.7A Active CN104427503B (en) 2013-09-06 2013-09-06 A kind of information filtering method and device

Country Status (1)

Country Link
CN (1) CN104427503B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106304084B (en) * 2016-08-15 2019-10-29 成都九鼎瑞信科技股份有限公司 Information processing method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101557441B (en) * 2009-05-12 2011-11-30 成都市华为赛门铁克科技有限公司 Method and device for call filtering
CN102547712A (en) * 2011-12-09 2012-07-04 成都市华为赛门铁克科技有限公司 Method and equipment for detecting junk incoming call

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102857921B (en) * 2011-06-30 2016-03-30 国际商业机器公司 Judge method and the device of spammer

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101557441B (en) * 2009-05-12 2011-11-30 成都市华为赛门铁克科技有限公司 Method and device for call filtering
CN102547712A (en) * 2011-12-09 2012-07-04 成都市华为赛门铁克科技有限公司 Method and equipment for detecting junk incoming call

Also Published As

Publication number Publication date
CN104427503A (en) 2015-03-18

Similar Documents

Publication Publication Date Title
CN105931123B (en) Friend recommendation method and device based on network account
CN106599317B (en) Test data processing method, device and the terminal of question answering system
CN102024045B (en) Information classification processing method, device and terminal
CN107526807A (en) Information recommendation method and device
CN103593799B (en) Natural person&#39;s information setting method, system and corresponding friend recommendation method, system
CN104717124A (en) Friend recommendation method, device and server
CN104573304A (en) User property state assessment method based on information entropy and cluster grouping
CN103501374A (en) Telephone book sequencing method and device as well as terminal
CN105022754A (en) Social network based object classification method and apparatus
CN107992513A (en) A kind of information processing system and its method for realizing information processing
CN107274042A (en) A kind of business participates in the Risk Identification Method and device of object
CN107368499A (en) A kind of client&#39;s tag modeling and recommendation method and device
CN104427503B (en) A kind of information filtering method and device
CN103778223B (en) Pervasive word-reciting system based on cloud platform and construction method thereof
CN107766075A (en) The processing method and processing device that code merges
CN109033224A (en) A kind of Risk Text recognition methods and device
CN106559556A (en) A kind of communication processing method, device
CN105022821B (en) Content filtering method and terminal
Ezpeleta et al. Short messages spam filtering using personality recognition
KR101568800B1 (en) Real-time issue search word sorting method and system
CN109325175A (en) Merge the news push method, device and equipment of microblogging interest digging
CN110955778A (en) Junk short message identification method and system based on differential privacy joint learning
CN109582829A (en) A kind of processing method, device, equipment and readable storage medium storing program for executing
CN104111926A (en) Generation method and generation device for attention recommending list of address book
CN115760453A (en) Method and device for creating accounting archive data association relation and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant