CN104462318A - Identity recognition method and device of identical names in multiple networks - Google Patents

Identity recognition method and device of identical names in multiple networks Download PDF

Info

Publication number
CN104462318A
CN104462318A CN201410719649.9A CN201410719649A CN104462318A CN 104462318 A CN104462318 A CN 104462318A CN 201410719649 A CN201410719649 A CN 201410719649A CN 104462318 A CN104462318 A CN 104462318A
Authority
CN
China
Prior art keywords
node
amp
energy
matrix
ij
Prior art date
Application number
CN201410719649.9A
Other languages
Chinese (zh)
Inventor
王晶华
陈晰
徐慧明
郭光�
魏明磊
Original Assignee
国家电网公司
国网河北省电力公司
国网河北省电力公司衡水供电分公司
国家电网公司信息通信分公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 国家电网公司, 国网河北省电力公司, 国网河北省电力公司衡水供电分公司, 国家电网公司信息通信分公司 filed Critical 国家电网公司
Priority to CN201410719649.9A priority Critical patent/CN104462318A/en
Publication of CN104462318A publication Critical patent/CN104462318A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Abstract

The invention discloses an identity recognition method and device of identical names in multiple networks. The method comprises the following steps: obtaining user identity information and user identity corresponding relations in multiple networks, using a user identity information set with known user identity corresponding relations as a training set, establishing a lowest energy model based on user behavior similarity according to the user identity information in the training set, and obtaining energy factors and a matching relation classifier; matching two pieces of arbitrary user identity information according to the matching relation classifier, and solving the energy matrix to obtain a matching result of single prediction; integrating the matching results solved in multiple times to obtain the user identity corresponding relations. The method and the device disclosed by the invention can be used for confirming the identity of information having different user identity information while having the same names in the multiple networks to improve the accuracy of a statistical result, the adopted algorithm is efficient, the calculation process is fast, and with the increase of sample databases, the accuracy rate of a calculation result is continuously improved.

Description

The personal identification method of identical name and device in a kind of Multi net voting

Technical field

The present invention relates to technical field of information processing, refer to personal identification method and the device of identical name in a kind of Multi net voting especially.

Background technology

Generally, same user registers different identity informations in heterogeneous networks, such as, and E-mail address, phone etc. information.Such as, at field of scientific study, often have a large amount of scientific worker collaborative work in multiple Research Team simultaneously, cause the personal information that same person uses when delivering academy's successes thus, as E-mail, unit, address etc., may not be identical, namely identical name has different identity information.When gathering academy's successes information relevant in field, owing to being difficult to judge whether these identical names are same person, such redundant information directly can affect the accuracy of statistics.Such as, scientific worker works in different team, the personal information of same scientific worker may occur in multiple network, such as, the website, paper net, technological achievements transfer net, patent transaction net etc. of certain university, and the personal information of this scientific worker in multiple network is not necessarily identical.

Traditional methods of social network only consider the behavioural characteristic of user in single network (as held a post in certain colleges and universities) usually, have ignored user and may be in association situation in multiple network, such as a user can be active in colleges and universities, scientific research institution of state-owned enterprise and social research institution simultaneously, and have different identity, interpersonal circle and research contents in each community network, the behavior analysis method for single network cannot be applied to this multitiered network environment.In multiple network, the node in each network may have distinct attribute, and there is incidence relations such as interdepending and cooperate between network and the node of network, therefore, needs a kind of method to the establishing identity of individuality of the same name in Multi net voting.

Summary of the invention

In view of this, the object of the invention is to the personal identification method and the device that propose identical name in a kind of Multi net voting, can will have different identity information in multiple network but the identical information of name is carried out homogeneity and determined.

The invention provides the personal identification method of identical name in a kind of Multi net voting based on above-mentioned purpose, comprising: obtain the subscriber identity information in multiple network and user identity corresponding relation; Using the subscriber identity information set of known users identity corresponding relation as training set; Build the minimum energy model based on user behavior similarity according to the described subscriber identity information in described training set, obtain energy factors and matching relationship sorter; According to described matching relationship sorter, any two subscriber identity informations are mated, and adopt energy factors to carry out energy filling forming energy matrix, solve the matching result that this energy matrix obtains single prediction; Carry out integrated to the matching result repeatedly solved, obtain user identity corresponding relation and determine to have the identity homogeneity of identical name user.

According to one embodiment of present invention, further, described using the set of the described subscriber identity information of known users identity corresponding relation as training set, according to the described subscriber identity information in described training set build based on user behavior similarity minimum energy model, obtain energy factors and matching relationship sorter comprises: for node V (i) any given in 2 networks P, Q, its network topology structure proper vector is: f (i)={ f 1, f 2... f d, wherein, node on behalf subscriber identity information, f [1-d]for the basic attributive character of node, comprising: node out-degree, in-degree, cluster coefficients, neighbor node, average degree, common neighbours; Set up node to proper vector vector, the node for 2 networks P, Q to proper vector vector is: F P ↔ Q = F ( V P ( i ) , V Q ( i ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( m ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( m ) } ; According to this node to proper vector vector to matched node to carrying out cluster, obtain the number with similar features node comprised in each cluster classification C, it can be used as particle characteristics, the user with similar network behavioural characteristic as energy factors, is given equivalent energy factors to the distribution situation of feature by node; Build matched node to energy model: wherein, β ifor the matched node of similar features carries out the node after cluster to number, ε to set according to feature ifor the energy factors that this classification is corresponding; The energy factors corresponding to each cluster classification is obtained: ε={ ε according to described energy model 1, ε 2..., ε k, and it can be used as forecasting process interior joint to the energy factors of generic; Set up K sorter according to cluster category result, and give each node to class number.

According to one embodiment of present invention, further, described according to described matching relationship sorter any two subscriber identity informations are carried out mating and adopt energy factors to carry out energy fill forming energy matrix, solve the matching result that this energy matrix obtains single prediction and comprise: its topological features is extracted respectively to the node of identity corresponding relation unknown in network P, Q: F p(i)={ f p(1), f p(2) ..., f p(m) } and F q(i)={ f q(1), f q(2) ..., f q(m) }; For the node i ∈ P of any unknown identity corresponding relation, j ∈ Q, build the matched node of n × n all unknown node to proper vector: F P ↔ Q = F ( V P ( i ) , V Q ( j ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( n ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( n ) } ; By K sorter, matched node is classified to proper vector, obtain each node to class label, build node to classification matrix; Energy factors filling is carried out to classification matrix, class label in classification matrix is replaced with the energy factors ε corresponding to this classification i=catogory, build energy matrix; Calculate the optimum matching of energy matrix.

According to one embodiment of present invention, further, the algorithm calculating the optimum matching of this energy matrix is:

min Σ i = 1 n Σ j = 1 n E ij λ ij ;

s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n ) ;

Σ j = 1 n λ ij = 1 ( j = 1,2 , . . . , n ) ,

λ ij∈{0,1};

Wherein, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, if corresponding relation be established, be labeled as 1, otherwise be labeled as 0, matching result is expressed as V P ( i ) ↔ V Q ( j ) .

According to one embodiment of present invention, further, the described matching result to repeatedly solving carries out integrated, obtain the corresponding relation of subscriber identity information and determine that the identity homogeneity with identical name comprises: obtaining ξ and predict the outcome, to vote predicting the outcome at every turn in node is to coupling matrix, obtain ballot matrix V-Matrix=(V ij); Solve the Optimum Matching problem of this ballot matrix V-Matrix, the formula of employing is:

max Σ i = 1 n Σ j = 1 n v ij λ ij ;

s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n ) ;

Σ j = 1 n λ ij = 1 ( j = 1,2 , . . . , n ) ;

λ ij∈{0,1};

Wherein, v ijrepresent the voting results of the i-th row jth row in ballot matrix, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, namely represent the final matching results that node is right.

The invention provides the identity recognition device of identical name in a kind of Multi net voting based on above-mentioned purpose, comprising: information acquisition unit, for obtaining subscriber identity information in multiple network and user identity corresponding relation; Training set generation unit, for using the subscriber identity information set of known users identity corresponding relation as training set; Build the minimum energy model based on user behavior similarity according to the described subscriber identity information in described training set, obtain energy factors and matching relationship sorter; Matching unit, for being mated by any two subscriber identity informations according to described matching relationship sorter, and adopting energy factors to carry out energy filling forming energy matrix, solving the matching result that this energy matrix obtains single prediction; Integrated unit, for carrying out integrated to the matching result repeatedly solved, obtaining user identity corresponding relation and determining to have the identity homogeneity of identical name user.

According to one embodiment of present invention, further, described training set generation unit, comprising: node sets up submodule to feature, for for node V (i) any given in 2 networks P, Q, setting up its network topology structure proper vector is: f (i)={ f 1, f 2... f d, wherein, node on behalf subscriber identity information, f [1-d]for the basic attributive character of node, comprising: node out-degree, in-degree, cluster coefficients, neighbor node, average degree, common neighbours; Set up node to proper vector vector, the node for 2 networks P, Q to proper vector vector is: F P ↔ Q = F ( V P ( i ) , V Q ( i ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( m ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( m ) } ; According to this node to proper vector vector to matched node to carrying out cluster, obtain the number with similar features node comprised in each cluster classification C, it can be used as particle characteristics, the user with similar network behavioural characteristic as energy factors, is given equivalent energy factors to the distribution situation of feature by node; Sorter generates submodule, for building matched node to energy model: wherein, β ifor the matched node of similar features carries out the node after cluster to number, ε to set according to feature ifor the energy factors that this classification is corresponding; The energy factors corresponding to each cluster classification is obtained: ε={ ε according to described energy model 1, ε 2..., ε k, and it can be used as forecasting process interior joint to the energy factors of generic; Set up K sorter according to cluster category result, and give each node to class number.

According to one embodiment of present invention, further, described training set generation unit, also comprises: node sets up submodule to classification, for extracting its topological features respectively to the node of identity corresponding relation unknown in network P, Q: F p(i)={ f p(1), f p(2) ..., f p(m) } and F q(i)={ f q(1), f q(2) ..., f q(m) }; For the node i ∈ P of any unknown identity corresponding relation, j ∈ Q, build the matched node of n × n all unknown node to proper vector: F P ↔ Q = F ( V P ( i ) , V Q ( j ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( n ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( n ) } ; By K sorter, matched node is classified to proper vector, obtain each node to class label, build node to classification matrix; Described matching unit, also for carrying out energy factors filling to classification matrix, replaces with the energy factors ε corresponding to this classification by class label in classification matrix i=catogory, build energy matrix, calculate the optimum matching of energy matrix.

According to one embodiment of present invention, further, described matching unit calculates the algorithm of the optimum matching of this energy matrix and is:

min Σ i = 1 n Σ j = 1 n E ij λ ij ;

s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n ) ;

Σ j = 1 n λ ij = 1 ( j = 1,2 , . . . , n ) ,

λ ij∈{0,1};

Wherein, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, if corresponding relation be established, be labeled as 1, otherwise be labeled as 0, matching result is expressed as V P ( i ) ↔ V Q ( j ) .

According to one embodiment of present invention, further, described integrated unit, also to predict the outcome with obtaining ξ, to vote, obtain ballot matrix V-Matrix=(V by predicting the outcome at every turn in node is to coupling matrix ij); Solve the Optimum Matching problem of this ballot matrix V-Matrix, the formula of employing is:

max Σ i = 1 n Σ j = 1 n v ij λ ij ;

s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n ) ;

Σ j = 1 n λ ij = 1 ( j = 1,2 , . . . , n ) ;

λ ij∈{0,1};

Wherein, v ijrepresent the voting results of the i-th row jth row in ballot matrix, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, namely represent the final matching results that node is right.

As can be seen from above, the personal identification method of identical name and device in Multi net voting of the present invention, different identity information can will be had but the identical information of name carries out homogeneity confirmation in multiple network, different identity information can be confirmed but whether the identical people of name is same person, the accuracy of statistics can be improved, and, the algorithm adopted is efficient, computation process is very fast, and along with the increase of Sample Storehouse, result of calculation accuracy rate also can improve constantly.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of an embodiment of the personal identification method of identical name in Multi net voting of the present invention;

Fig. 2 is the process flow diagram of another embodiment of the personal identification method of identical name in Multi net voting of the present invention;

In Fig. 3 Multi net voting of the present invention, the two-tier network arbitrary node of the personal identification method of identical name is to classification matrix schematic diagram;

Fig. 4 is that in the energy matrix of the personal identification method of identical name in Multi net voting of the present invention, energy factors fills schematic diagram;

The schematic diagram of ballot matrix when Fig. 5 is ξ=2 of the personal identification method of identical name in Multi net voting of the present invention;

Fig. 6 is the ballot of the personal identification method of identical name in Multi net voting of the present invention and the schematic diagram of Integrated Algorithm process;

Fig. 7 is the schematic diagram of an embodiment of the identity recognition device of identical name in Multi net voting of the present invention.

Embodiment

For making the object, technical solutions and advantages of the present invention clearly understand, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.

Fig. 1 is the process flow diagram of an embodiment of the personal identification method of identical name in Multi net voting of the present invention; As shown in Figure 1:

Step 101, obtains the subscriber identity information in multiple network and user identity corresponding relation.

Step 102, using the subscriber identity information set of known users identity corresponding relation as training set.

Step 103, builds the minimum energy model based on user behavior similarity according to the described subscriber identity information in described training set, obtains energy factors and matching relationship sorter.

Any two subscriber identity informations are mated according to described matching relationship sorter by step 104, and adopt energy factors to carry out energy filling forming energy matrix, solve the matching result that this energy matrix obtains single prediction.

Step 105, carries out integrated to the matching result repeatedly solved, and obtains user identity corresponding relation and determines to have the identity homogeneity of identical name user.

Determine that the key of the Problems of Identity of individuality of the same name in Multi net voting is to find out the individual node corresponding relation in heterogeneous networks of multiple identities, i.e. network intermediate node matching problem.And mutual interactive information etc. between a large amount of individuality in the internet information records such as the topology information of heterogeneous networks, social network sites, possibility is provided for solving the internetwork coupled problem of different layers, such as, the node with same identity can be identified in heterogeneous networks to a certain extent by degree, bunch coefficient, neighbours' structure, common friends etc.

Fig. 2 is the process flow diagram of another embodiment of the personal identification method of identical name in Multi net voting of the present invention; As shown in Figure 2:

Step 201-208 is model formulation, using the user of known identities corresponding relation set as training set, according to known part of nodes to the minimum energy model of information one side structure based on user behavior similarity, thus obtain node to corresponding energy factors, training obtains node to matching relationship sorter on the other hand, is used to guide the coupling that unknown matching relationship node is right.

Step 210-216 is node matching, carries out the coupling of any two nodes according to the sorter in model formulation process, and adopts energy factors to carry out energy filling, obtains the node of single prediction to matching result in the energy minimization process after solving filling.

Step 217-219 for ballot integrated, on the basis of repeatedly node matching process, carrying out predicting the outcome integrated, obtaining the corresponding relation of user identity in final multitiered network, and judge the homogeneity of identical name with this.

Ising model is a kind of model describing material phase transformation.Through phase transformation, new structure and physical property be there is in material.The system undergone phase transition is generally the system had between molecule compared with strong interaction, also known as partner systems.In model formulation process, Yi Xin theoretical model principle is applied to, in the node matching process of two networks, by extracting the topological features f of nodes, set up matched node to proper vector F p-Q.

According to this proper vector to matched node to carrying out cluster, obtain the number with similar features node comprised in each cluster classification C, it can be used as particle characteristics.The user group with similar network behavioural characteristic as magnetic probability (energy factors), is given equivalent energy factors to the distribution situation of feature by node.According to the maximum energy criterion of spin model, suppose that system total energy value is minimum when nodes all in double-layer network are to during by entirely true coupling, and build matched node according to this to energy model:

min H = Σ i = 1 k β i ϵ i - - - ( 1 )

Wherein, β ifor the matched node of similar features carries out the node after cluster to number, ε to set according to feature ifor the energy factors that this classification is corresponding.

By nonlinear optimization method, the energy factors corresponding to each cluster classification will be obtained: ε={ ε 1, ε 2..., ε k, and it can be used as forecasting process interior joint to the energy factors of generic.Set up K-sorter (CLASSIFIER) according to above-mentioned cluster category result, and give each node to class number

In one embodiment, the prerequisite setting up energy model is that the vectorization of network node represents, for any given node V (i), defining its network topology structure proper vector is: f (i)={ f 1, f 2... f d, wherein f [1-d]may be the basic attributive character of node, such as node out-degree, in-degree, cluster coefficients, neighbor node, average degree etc., also may be point spread attributive character, such as common neighbours, Jaccard coefficient etc. between two nodes.

On this basis, the right structural eigenvector of node is then the set of base attribute characteristic sum extended attribute feature in multitiered network, and for two networks or two-tier network, then node can be expressed as vector:

F P ↔ Q = F ( V P ( i ) , V Q ( i ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( m ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( m ) } .

In one embodiment, in node matching process, for two or two-tier network, node contains the institute of two-tier network interior joint coupling likely to classification matrix, may correct matching result one to one be found then identity forecasting problem right for node to be converted to bipartite graph Optimum Matching problem situation from numerous, namely make only there is a matching result in any row and column in matrix by optimized algorithm.

And according to the minimum model of energy value, in order to make matching result global energy value minimum, first need to carry out energy factors filling to classification matrix, replace with the energy factors corresponding to this classification by class label in matrix, and build energy matrix as shown in Figure 3.In this n × n energy matrix, this method target, for finding n best matching result, meets system capacity value minimum.Available algorithm is a lot, such as Hungary Algorithm.

Hungary Algorithm is one of numerous algorithm for solving linear Task Allocation Problem, is used to the classic algorithm solving bipartite graph maximum matching problem.If G=(V, E) is a non-directed graph.As vertex set V can subregion be two mutually disjoint subset V1, V2 also, and two summits that in figure, every bar limit depends on all belong to these two different subsets, then title figure G is bipartite graph.Bipartite graph also can be designated as G=(V1, V2, E).A given bipartite graph G, in a subgraph M of G, { any two limits in E} do not depend on same summit to the limit collection of M, then claim M to be a coupling.Subset that in such subset, limit number is maximum is selected to be called the maximum matching problem (maximal matching problem) of figure.If figure's is all summita limit in all mating with certain is associated, then claim this coupling for mate completely, also referred to as complete, and perfect matching.

For 2 networks, first, to the node of identity corresponding relation unknown in network, its topological features is extracted respectively: F p(i)={ f p(1), f p(2) ..., f p(m) } and F q(i)={ f q(1), f q(2) ..., f q(m) }.

For arbitrary node i ∈ P, j ∈ Q, build n × n all possible matched node to proper vector:

F P ↔ Q = F ( V P ( i ) , V Q ( j ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( n ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( n ) }

Classified by K sorter, obtain each node to class label, thus the node built as shown in Figure 3 is to classification matrix.

Node contains the institute of two-tier network interior joint coupling likely to classification matrix, may correct matching result one to one be found then identity forecasting problem right for node to be converted to bipartite graph Optimum Matching problem situation from numerous, namely make only there is a matching result in any row and column in matrix by optimized algorithm.And according to the minimum model of energy value, in order to make matching result global energy value minimum, first need to carry out energy factors filling to classification matrix, replace with the energy factors ε corresponding to this classification by class label in matrix i=catogory, and build energy matrix as shown in Figure 4 further.

In energy matrix, target, for finding n best matching result, meets system capacity value minimum.Adopt Hungary Algorithm calculate this optimum matching, its mathematical model or algorithm as follows:

min Σ i = 1 n Σ j = 1 n E ij λ ij

s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n )

Σ j = 1 n λ ij = 1 ( j = 1,2 , . . . , n )

λ ij∈{0,1}

Wherein, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, if corresponding relation be established, be labeled as 1, otherwise be labeled as 0.Without loss of generality, matching result is expressed as

In one embodiment, in ballot integrating process, due to the randomness of clustering algorithm when choosing cluster centre, the global optimum of cluster result might not be ensured, the uncertainty that this characteristic can cause single to predict the outcome.In order to more be stablized and result accurately, the present invention introduces Integrated Algorithm, is finally predicted the outcome by ballot and Secondary Match optimization.

For given data to be predicted, the identity that first reruns corresponding relation prediction algorithm ξ time, obtains ξ and to predict the outcome, then to vote predicting the outcome at every turn in node is to coupling matrix, obtains ballot matrix V-Matrix=(V ij), as shown in Figure 5.Such as, given ξ=2, then will obtain twice matching result, if the node corresponding relation of matching result is for the first time then V is set 11, V 22, V 33and V nnvalue be 1, if second time matching result node corresponding relation be then V is set 11, V 23, V 32and V nnvalue be 1, and by V 11and V nnvalue add 1.

Again adopt the Optimum Matching problem of Hungarian Method this bipartite graph, to make system capacity value reach minimum process in forecasting process different, in Voting Algorithm, made by following for employing algorithm voting results reach overall maximum:

max Σ i = 1 n Σ j = 1 n v ij λ ij

s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n )

Σ j = 1 n λ ij = 1 ( j = 1,2 , . . . , n )

λ ij∈{0,1}

Wherein, v ijrepresent the voting results of the i-th row jth row in ballot matrix, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, namely represent the final matching results that node is right.After this bipartite graph Optimum Matching problem solving, obtain node as shown in Figure 6 to final matching results.

In one embodiment, adopt the True Data collection of disclosed 2 networks, be respectively Twitter and Friendfeed network, this data set comprises 155,804 users being simultaneously registered in Twitter and Friendfeed, and comprises its user identity corresponding relation.Wherein, Twitter data set comprises 13, effectively pay close attention to relation record for 142,341, and Friendfeed data set comprises 5,939,687 effective friends records.

In experimentation, be training set and test set by Data Placement, and the user's ratio arranging unknown identity corresponding relation is α, and α ∈ (0,1), such as α=5% item represents under the prerequisite of known 95% identity corresponding relation, the identity corresponding relation of prediction residue 5% user.

Under the prerequisite of fixing cluster number K and unknown data collection ratio α, the accuracy rate of testing algorithm, and on the data set of different size the extensibility of algorithm.In the test of carried out ten examples, parameter is set to K=6 respectively, α=5% and ξ=20,100}, as shown in table 1 below:

Table 1-test figure table

Can be seen by upper table, multitiered network node identities Forecasting Methodology proposed by the invention Average Accuracy on this True Data collection, more than 90%, and has consistance result for the data set of different size, shows that the method has good extensibility.

As shown in Figure 7, the invention provides the identity recognition device 4 of identical name in a kind of Multi net voting, comprising: information acquisition unit 41, training set generation unit 42, matching unit 43, integrated unit 44.Information acquisition unit 41 obtains subscriber identity information in multiple network and user identity corresponding relation.Training set generation unit 42 using the subscriber identity information set of known users identity corresponding relation as training set; Build the minimum energy model based on user behavior similarity according to the described subscriber identity information in described training set, obtain energy factors and matching relationship sorter.

Any two subscriber identity informations mate according to described matching relationship sorter by matching unit 43, and adopt energy factors to carry out energy filling forming energy matrix, solve the matching result that this energy matrix obtains single prediction.Integrated unit 44 carries out integrated to the matching result repeatedly solved, and obtains user identity corresponding relation and determines to have the identity homogeneity of identical name user.

In one embodiment, training set generation unit 42, comprising: node sets up submodule to feature, and for node V (i) any given in 2 networks P, Q, setting up its network topology structure proper vector is: f (i)={ f 1, f 2... f d, wherein, node on behalf subscriber identity information, f [1-d]for the basic attributive character of node, comprising: node out-degree, in-degree, cluster coefficients, neighbor node, average degree, common neighbours; Set up node to proper vector vector, the node for 2 networks P, Q to proper vector vector is: F P ↔ Q = F ( V P ( i ) , V Q ( i ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( m ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( m ) } ; According to this node to proper vector vector to matched node to carrying out cluster, obtain the number with similar features node comprised in each cluster classification C, it can be used as particle characteristics, the user with similar network behavioural characteristic as energy factors, is given equivalent energy factors to the distribution situation of feature by node.

Training set generation unit 42 comprises: sorter generates submodule, builds matched node to energy model: β ifor the matched node of similar features carries out the node after cluster to number, ε to set according to feature ifor the energy factors that this classification is corresponding; The energy factors corresponding to each cluster classification is obtained: ε={ ε according to described energy model 1, ε 2..., ε k, and it can be used as forecasting process interior joint to the energy factors of generic; Set up K sorter according to cluster category result, and give each node to class number.

Described training set generation unit 42 also comprises: node sets up submodule to classification, extracts its topological features respectively: F to the node of identity corresponding relation unknown in network P, Q p(i)={ f p(1), f p(2) ..., f p(m) } and F q(i)={ f q(1), f q(2) ..., f q(m) }; For the node i ∈ P of any unknown identity corresponding relation, j ∈ Q, build the matched node of n × n all unknown node to proper vector: F P ↔ Q = F ( V P ( i ) , V Q ( j ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( n ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( n ) } ; By K sorter, matched node is classified to proper vector, obtain each node to class label, build node to classification matrix.

Described matching unit 43 pairs of classification matrixes carry out energy factors filling, and class label in classification matrix is replaced with the energy factors ε corresponding to this classification i=catogory, build energy matrix, calculate the optimum matching of energy matrix.

The algorithm that described matching unit 43 calculates the optimum matching of this energy matrix is:

min Σ i = 1 n Σ j = 1 n E ij λ ij ;

s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n ) ;

Σ j = 1 n λ ij = 1 ( j = 1,2 , . . . , n ) ,

λ ij∈{0,1};

Wherein, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, if corresponding relation be established, be labeled as 1, otherwise be labeled as 0, matching result is expressed as V P ( i ) ↔ V Q ( j ) .

Described integrated unit 44 obtains ξ and to predict the outcome, and to vote, obtain ballot matrix V-Matrix=(V by predicting the outcome at every turn in node is to coupling matrix ij); Solve the Optimum Matching problem of this ballot matrix V-Matrix, the formula of employing is:

max Σ i = 1 n Σ j = 1 n v ij λ ij ;

s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n ) ;

Σ j = 1 n λ ij = 1 ( j = 1,2 , . . . , n ) ;

λ ij∈{0,1};

Wherein, v ijrepresent the voting results of the i-th row jth row in ballot matrix, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, namely represent the final matching results that node is right.

The personal identification method of identical name and device in Multi net voting of the present invention, different identity information can will be had but the identical information of name carries out homogeneity confirmation in multiple network, different identity information can be confirmed but whether the identical people of name is same person, the accuracy of statistics can be improved.The algorithm adopted is efficient, and computation process is very fast, and along with the increase of Sample Storehouse, result of calculation accuracy rate also can improve constantly.

The memory management method in the intelligent meter storehouse that above-described embodiment provides and system, solve according to the suitable storage policy of different warehouse Foreground selection by the storage policy optimized, not only effectively can utilize storage space, improve warehouse execution efficiency, reduce operating cost, also can bring a lot of benefit for whole intelligent meter storehouse in management simultaneously.

Those of ordinary skill in the field are to be understood that: the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. the personal identification method of identical name in Multi net voting, is characterized in that, comprising:
Obtain the subscriber identity information in multiple network and user identity corresponding relation;
Using the subscriber identity information set of known users identity corresponding relation as training set;
Build the minimum energy model based on user behavior similarity according to the described subscriber identity information in described training set, obtain energy factors and matching relationship sorter;
According to described matching relationship sorter, any two subscriber identity informations are mated, and adopt energy factors to carry out energy filling forming energy matrix, solve the matching result that this energy matrix obtains single prediction;
Carry out integrated to the matching result repeatedly solved, obtain user identity corresponding relation and determine to have the identity homogeneity of identical name user.
2. the method for claim 1, it is characterized in that, described using the set of the described subscriber identity information of known users identity corresponding relation as training set, according to the described subscriber identity information in described training set build based on user behavior similarity minimum energy model, obtain energy factors and matching relationship sorter comprises:
For node V (i) any given in 2 networks P, Q, its network topology structure proper vector is: f (i)={ f 1, f 2... f d, wherein, node on behalf subscriber identity information, f [1-d]for the basic attributive character of node, comprising: node out-degree, in-degree, cluster coefficients, neighbor node, average degree, common neighbours;
Set up node to proper vector vector, the node for 2 networks P, Q to proper vector vector is:
F P ↔ Q = F ( V P ( i ) , V Q ( i ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( m ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( m ) } ;
According to this node to proper vector vector to matched node to carrying out cluster, obtain the number with similar features node comprised in each cluster classification C, it can be used as particle characteristics, the user with similar network behavioural characteristic as energy factors, is given equivalent energy factors to the distribution situation of feature by node;
Build matched node to energy model: wherein, β ifor the matched node of similar features carries out the node after cluster to number, ε to set according to feature ifor the energy factors that this classification is corresponding;
The energy factors corresponding to each cluster classification is obtained: ε={ ε according to described energy model 1, ε 2..., ε k, and it can be used as forecasting process interior joint to the energy factors of generic;
Set up K sorter according to cluster category result, and give each node to class number.
3. the method for claim 1, it is characterized in that, described according to described matching relationship sorter any two subscriber identity informations are carried out mating and adopt energy factors to carry out energy fill forming energy matrix, solve the matching result that this energy matrix obtains single prediction and comprise:
Its topological features is extracted respectively to the node of identity corresponding relation unknown in network P, Q: F p(i)={ f p(1), f p(2) ..., f p(m) } and F q(i)={ f q(1), f q(2) ..., f q(m) };
For the node i ∈ P of any unknown identity corresponding relation, j ∈ Q, build the matched node of n × n all unknown node to proper vector:
F P ↔ Q = F ( V P ( i ) , V Q ( j ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( n ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( n ) } ;
By K sorter, matched node is classified to proper vector, obtain each node to class label, build node to classification matrix;
Energy factors filling is carried out to classification matrix, class label in classification matrix is replaced with the energy factors ε corresponding to this classification i=catogory, build energy matrix;
Calculate the optimum matching of energy matrix.
4. method as claimed in claim 3, is characterized in that,
The algorithm calculating the optimum matching of this energy matrix is:
min Σ i = 1 n Σ j = 1 n E ij λ ij ;
s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n ) ;
Σ j = 1 n λ ij = 1 ( j = 1,2 , . . . , n ) ,
λ ij∈{0,1};
Wherein, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, if corresponding relation be established, be labeled as 1, otherwise be labeled as 0, matching result is expressed as V P ( i ) ↔ V Q ( j ) .
5. the method as described in claim 3 or 4, is characterized in that, the described matching result to repeatedly solving carries out integrated, obtains the corresponding relation of subscriber identity information and determines that the identity homogeneity with identical name comprises:
Obtain ξ to predict the outcome, to vote predicting the outcome at every turn in node is to coupling matrix, obtain ballot matrix V-Matrix=(V ij);
Solve the Optimum Matching problem of this ballot matrix V-Matrix, the algorithm of employing is:
max Σ i = 1 n Σ j = 1 n v ij λ ij ;
s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n ) ;
Σ j = 1 n λ ij = 1 ( j = 1,2 , . . . , n ) ;
λ ij∈{0,1};
Wherein, v ijrepresent the voting results of the i-th row jth row in ballot matrix, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, namely represent the final matching results that node is right.
6. the identity recognition device of identical name in Multi net voting, is characterized in that, comprising:
Information acquisition unit, for obtaining subscriber identity information in multiple network and user identity corresponding relation;
Training set generation unit, for using the subscriber identity information set of known users identity corresponding relation as training set; Build the minimum energy model based on user behavior similarity according to the described subscriber identity information in described training set, obtain energy factors and matching relationship sorter;
Matching unit, for being mated by any two subscriber identity informations according to described matching relationship sorter, and adopting energy factors to carry out energy filling forming energy matrix, solving the matching result that this energy matrix obtains single prediction;
Integrated unit, for carrying out integrated to the matching result repeatedly solved, obtaining user identity corresponding relation and determining to have the identity homogeneity of identical name user.
7. device as claimed in claim 6, is characterized in that:
Described training set generation unit, comprising:
Node sets up submodule to feature, and for for node V (i) any given in 2 networks P, Q, setting up its network topology structure proper vector is: f (i)={ f 1, f 2... f d, wherein, node on behalf subscriber identity information, f [1-d]for the basic attributive character of node, comprising: node out-degree, in-degree, cluster coefficients, neighbor node, average degree, common neighbours; Set up node to proper vector vector, the node for 2 networks P, Q to proper vector vector is: F P ↔ Q = F ( V P ( i ) , V Q ( i ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( m ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( m ) } ; According to this node to proper vector vector to matched node to carrying out cluster, obtain the number with similar features node comprised in each cluster classification C, it can be used as particle characteristics, the user with similar network behavioural characteristic as energy factors, is given equivalent energy factors to the distribution situation of feature by node;
Sorter generates submodule, for building matched node to energy model: wherein, β ifor the matched node of similar features carries out the node after cluster to number, ε to set according to feature ifor the energy factors that this classification is corresponding; The energy factors corresponding to each cluster classification is obtained: ε={ ε according to described energy model 1, ε 2..., ε k, and it can be used as forecasting process interior joint to the energy factors of generic; Set up K sorter according to cluster category result, and give each node to class number.
8. device as claimed in claim 6, is characterized in that:
Described training set generation unit, also comprises:
Node sets up submodule to classification, for extracting its topological features respectively to the node of identity corresponding relation unknown in network P, Q: F p(i)={ f p(1), f p(2) ..., f p(m) } and F q(i)={ f q(1), f q(2) ..., f q(m) }; For the node i ∈ P of any unknown identity corresponding relation, j ∈ Q, build the matched node of n × n all unknown node to proper vector: F P ↔ Q = F ( V P ( i ) , V Q ( j ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( n ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( n ) } ; By K sorter, matched node is classified to proper vector, obtain each node to class label, build node to classification matrix;
Described matching unit, also for carrying out energy factors filling to classification matrix, replaces with the energy factors ε corresponding to this classification by class label in classification matrix i=catogory, build energy matrix, calculate the optimum matching of energy matrix.
9. device as claimed in claim 8, is characterized in that,
The algorithm that described matching unit calculates the optimum matching of this energy matrix is:
min Σ i = 1 n Σ j = 1 n E ij λ ij ;
s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n ) ;
Σ j = 1 n λ ij = 1 ( j = 1,2 , . . . , n ) ,
λ ij∈{0,1};
Wherein, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, if corresponding relation be established, be labeled as 1, otherwise be labeled as 0, matching result is expressed as V P ( i ) ↔ V Q ( j ) .
10. device as claimed in claim 8 or 9, is characterized in that:
Described integrated unit, also to predict the outcome with obtaining ξ, to vote, obtain ballot matrix V-Matrix=(V by predicting the outcome at every turn in node is to coupling matrix ij); Solve the Optimum Matching problem of this ballot matrix V-Matrix, the formula of employing is:
max Σ i = 1 n Σ j = 1 n v ij λ ij ;
s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n ) ;
Σ j = 1 n λ ij = 1 ( j = 1,2 , . . . , n ) ;
λ ij∈{0,1};
Wherein, v ijrepresent the voting results of the i-th row jth row in ballot matrix, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, namely represent the final matching results that node is right.
CN201410719649.9A 2014-12-01 2014-12-01 Identity recognition method and device of identical names in multiple networks CN104462318A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410719649.9A CN104462318A (en) 2014-12-01 2014-12-01 Identity recognition method and device of identical names in multiple networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410719649.9A CN104462318A (en) 2014-12-01 2014-12-01 Identity recognition method and device of identical names in multiple networks

Publications (1)

Publication Number Publication Date
CN104462318A true CN104462318A (en) 2015-03-25

Family

ID=52908353

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410719649.9A CN104462318A (en) 2014-12-01 2014-12-01 Identity recognition method and device of identical names in multiple networks

Country Status (1)

Country Link
CN (1) CN104462318A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105227352A (en) * 2015-09-02 2016-01-06 新浪网技术(中国)有限公司 A kind of update method of user ID collection and device
CN106529110A (en) * 2015-09-09 2017-03-22 阿里巴巴集团控股有限公司 Classification method and equipment of user data
CN107330459A (en) * 2017-06-28 2017-11-07 联想(北京)有限公司 A kind of data processing method, device and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120041979A1 (en) * 2010-08-12 2012-02-16 Industry-Academic Cooperation Foundation, Yonsei University Method for generating context hierarchy and system for generating context hierarchy
CN102651030A (en) * 2012-04-09 2012-08-29 华中科技大学 Social network association searching method based on graphics processing unit (GPU) multiple sequence alignment algorithm

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120041979A1 (en) * 2010-08-12 2012-02-16 Industry-Academic Cooperation Foundation, Yonsei University Method for generating context hierarchy and system for generating context hierarchy
CN102651030A (en) * 2012-04-09 2012-08-29 华中科技大学 Social network association searching method based on graphics processing unit (GPU) multiple sequence alignment algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
夏虎: "移动社交网络结构和行为研究及其应用", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105227352A (en) * 2015-09-02 2016-01-06 新浪网技术(中国)有限公司 A kind of update method of user ID collection and device
CN105227352B (en) * 2015-09-02 2019-03-19 新浪网技术(中国)有限公司 A kind of update method and device of user identifier collection
CN106529110A (en) * 2015-09-09 2017-03-22 阿里巴巴集团控股有限公司 Classification method and equipment of user data
CN107330459A (en) * 2017-06-28 2017-11-07 联想(北京)有限公司 A kind of data processing method, device and electronic equipment

Similar Documents

Publication Publication Date Title
Li et al. Agglomerative fuzzy k-means clustering algorithm with selection of number of clusters
Zhang et al. CAP: Community activity prediction based on big data analysis
Liu et al. Link prediction in complex networks: A local naïve Bayes model
Dong et al. Inferring user demographics and social strategies in mobile social networks
Yang et al. Community mining from signed social networks
CN102687169B (en) The method and apparatus creating platform is provided
Amiri et al. Community detection in complex networks: Multi–objective enhanced firefly algorithm
Chiu et al. An intelligent market segmentation system using k-means and particle swarm optimization
Xiang et al. Modeling relationship strength in online social networks
Liao et al. An enhanced consensus reaching process in group decision making with intuitionistic fuzzy preference relations
CN101520878A (en) Method, device and system for pushing advertisements to users
Shaw et al. Learning a distance metric from a network
Karim et al. Decision tree and naive bayes algorithm for classification and generation of actionable knowledge for direct marketing
Roszkowska Multi-criteria decision making models by applying the TOPSIS method to crisp and interval data
Ma et al. A highly accurate prediction algorithm for unknown web service QoS values
Khalid et al. OmniSuggest: A ubiquitous cloud-based context-aware recommendation system for mobile social networks
Nadaban et al. Fuzzy topsis: A general view
CN104731962B (en) Friend recommendation method and system based on similar corporations in a kind of social networks
Lin et al. Website reorganization using an ant colony system
Ren et al. Intuitionistic multiplicative analytic hierarchy process in group decision making
Li et al. A comparative analysis of evolutionary and memetic algorithms for community detection from signed social networks
Zhang et al. User community discovery from multi-relational networks
US9514248B1 (en) System to group internet devices based upon device usage
Alzahrani et al. Community detection in bipartite networks: Algorithms and case studies
Son et al. Content-based filtering for recommendation systems using multiattribute networks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150325