CN104376116A - Search method and device for figure information - Google Patents

Search method and device for figure information Download PDF

Info

Publication number
CN104376116A
CN104376116A CN201410720437.2A CN201410720437A CN104376116A CN 104376116 A CN104376116 A CN 104376116A CN 201410720437 A CN201410720437 A CN 201410720437A CN 104376116 A CN104376116 A CN 104376116A
Authority
CN
China
Prior art keywords
node
energy
identity
matrix
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410720437.2A
Other languages
Chinese (zh)
Inventor
王晶华
陈晰
郭光�
谢乃博
魏明磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Information and Telecommunication Co Ltd
State Grid Hebei Electric Power Co Ltd
Hengshui Power Supply Co of State Grid Hebei Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Information and Telecommunication Co Ltd
State Grid Hebei Electric Power Co Ltd
Hengshui Power Supply Co of State Grid Hebei Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Information and Telecommunication Co Ltd, State Grid Hebei Electric Power Co Ltd, Hengshui Power Supply Co of State Grid Hebei Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201410720437.2A priority Critical patent/CN104376116A/en
Publication of CN104376116A publication Critical patent/CN104376116A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a search method and device for figure information. The search method includes the steps that figure identity information in multiple networks and figure identity corresponding relations are obtained; the figure identity information of the known figure identity corresponding relations is gathered to serve as a training set; an energy minimum model based on the figure behavior similarity degree is constructed, and then energy factors and a matching relation classifier are obtained; energy filling is performed through the energy factors to form an energy matrix, and a matching result predicted at one time is obtained through solving the energy matrix; matching results obtained through repeated solving are integrated, so that the figure identity corresponding relations are obtained, and the identity uniformity of figures with the same name is determined; an input name is received, and the identity information, in different networks, of the same figure is displayed in one webpage. The method and device can determine the uniformity of the information with different identities but the same name in the multiple networks, and the adopted algorithm is efficient and calculation is rapid in the process of displaying the identity information, in different networks, of the same figure in the webpage.

Description

A kind of searching method of people information and device
Technical field
The present invention relates to internet search engine technical field, refer to a kind of searching method and device of people information especially.
Background technology
Utilize search engine retrieving mission bit stream to be one of main activities of internet personage, but in real world, multiple personage has a name or same personage, and in heterogeneous networks, register different identity informations be a kind of very general phenomenon.Such as, at field of scientific study, often have a large amount of scientific worker collaborative work in multiple Research Team simultaneously, cause the personal information that same person uses when delivering academy's successes thus, as E-mail, unit, address etc., may not be identical, namely identical name has different identity information.When gathering academy's successes information relevant in field, owing to being difficult to judge whether these identical names are same person, such redundant information directly can affect the accuracy of statistics.Scientific worker works in different team, the personal information of same scientific worker may occur in multiple network, such as, the website, paper net, technological achievements transfer net, patent transaction net etc. of certain university, and the personal information of this scientific worker in multiple network is not necessarily identical.
When retrieving, the page is just simply enumerated, have ignored personage and may be in association situation in multiple network, such as a personage can be active in colleges and universities, scientific research institution of state-owned enterprise and social research institution simultaneously, and have different identity, interpersonal circle and research contents in each community network, the behavior analysis method for single network cannot be applied to this multitiered network environment.In multiple network, the node in each network may have distinct attribute, and there is incidence relations such as interdepending and cooperate between network and the node of network.At present, when retrieving, the page of display is just simply enumerated, and do not arrange according to the feature of personage, personage cannot hold clearly to the information of the personage of described concern.
Summary of the invention
In view of this, the object of the invention is to the searching method and the device that propose a kind of people information, can will have different identity information in multiple network but the identical information of name is carried out homogeneity and determined and show.
The invention provides a kind of searching method of people information based on above-mentioned purpose, comprising: obtain the piece identity's information in multiple network and piece identity's corresponding relation; Using piece identity's information aggregate of known piece identity's corresponding relation as training set; According to the minimum energy model of the described piece identity's information architecture in described training set based on personage's behavior similarity, obtain energy factors and matching relationship sorter; According to described matching relationship sorter, any two personage's identity informations are mated, and adopt energy factors to carry out energy filling forming energy matrix, solve the matching result that this energy matrix obtains single prediction; Carry out integrated to the matching result repeatedly solved, obtain piece identity's corresponding relation and determine to have the identity homogeneity of same person name personage; Receive the name of input, from multiple network, obtain the piece identity information corresponding with described name; According to the identity homogeneity of the personage of identical name, in a webpage, show the identity information of same personage in heterogeneous networks, wherein, identity information comprises: Email, phone, unit.
According to one embodiment of present invention, further, described using the set of described piece identity's information of known piece identity's corresponding relation as training set, according to the described piece identity's information architecture in described training set based on personage's behavior similarity minimum energy model, obtain energy factors and matching relationship sorter and comprise: for node V (i) any given in 2 networks P, Q, its network topology structure proper vector is: f (i)={ f 1, f 2... f d, wherein, node on behalf piece identity information, f [1-d]for the basic attributive character of node, comprising: node out-degree, in-degree, cluster coefficients, neighbor node, average degree, common neighbours; Set up node to proper vector vector, the node for 2 networks P, Q to proper vector vector is: F P ↔ Q = F ( V P ( i ) , V Q ( i ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( m ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( m ) } ; According to this node to proper vector vector to matched node to carrying out cluster, obtain the number with similar features node comprised in each cluster classification C, it can be used as particle characteristics, the personage with similar network behavioural characteristic as energy factors, is given equivalent energy factors to the distribution situation of feature by node; Build matched node to energy model: wherein, β ifor the matched node of similar features carries out the node after cluster to number, ε to set according to feature ifor the energy factors that this classification is corresponding; The energy factors corresponding to each cluster classification is obtained: ε={ ε according to described energy model 1, ε 2..., ε k, and it can be used as forecasting process interior joint to the energy factors of generic; Set up K sorter according to cluster category result, and give each node to class number.
According to one embodiment of present invention, further, described according to described matching relationship sorter any two personage's identity informations are carried out mating and adopt energy factors to carry out energy fill forming energy matrix, solve the matching result that this energy matrix obtains single prediction and comprise: its topological features is extracted respectively to the node of identity corresponding relation unknown in network P, Q: F p(i)={ f p(1), f p(2) ..., f p(m) } and F q(i)={ f q(1), f q(2) ..., f q(m) }; For the node i ∈ P of any unknown identity corresponding relation, j ∈ Q, build the matched node of n × n all unknown node to proper vector: F P ↔ Q = F ( V P ( i ) , V Q ( i ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( n ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( n ) } ; By K sorter, matched node is classified to proper vector, obtain each node to class label, build node to classification matrix; Energy factors filling is carried out to classification matrix, class label in classification matrix is replaced with the energy factors ε corresponding to this classification i=catogory, build energy matrix; Calculate the optimum matching of energy matrix.
According to one embodiment of present invention, further, the algorithm calculating the optimum matching of this energy matrix is:
min Σ i = 1 n Σ j = 1 n E ij λ ij ;
s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n ) ;
Σ j = 1 n λ ij = 1 ( j = 1 , 2 , . . . , n ) ,
λ ij∈{0,1};
Wherein, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, if corresponding relation be established, be labeled as 1, otherwise be labeled as 0, matching result is expressed as
According to one embodiment of present invention, further, the described matching result to repeatedly solving carries out integrated, obtain the corresponding relation of piece identity's information and determine that the identity homogeneity with identical name comprises: obtaining ξ and predict the outcome, to vote predicting the outcome at every turn in node is to coupling matrix, obtain ballot matrix V-Matrix=(V ij); Solve the Optimum Matching problem of this ballot matrix V-Matrix, the formula of employing is:
max Σ i = 1 n Σ j = 1 n v ij λ ij ;
s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n ) ;
Σ j = 1 n λ ij = 1 ( j = 1 , 2 , . . . , n ) ;
λ ij∈{0,1};
Wherein, v ijrepresent the voting results of the i-th row jth row in ballot matrix, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, namely represent the final matching results that node is right.
A kind of searcher of people information is the invention provides based on above-mentioned purpose, comprise: information acquisition device, for receiving the name of input, and the webpage obtained with described name, duplicate removal process is carried out to webpage, and extract the piece identity information corresponding with described name and piece identity's corresponding relation from described webpage; Piece identity's display unit, for determining the identity homogeneity of the personage with identical name, and in a webpage, show the identity information of same personage in heterogeneous networks, wherein, identity information comprises: Email, phone, unit.
According to one embodiment of present invention, further, training set generation unit, for using piece identity's information aggregate of known piece identity's corresponding relation as training set; According to the minimum energy model of the described piece identity's information architecture in described training set based on personage's behavior similarity, obtain energy factors and matching relationship sorter; Matching unit, for being mated by any two personage's identity informations according to described matching relationship sorter, and adopting energy factors to carry out energy filling forming energy matrix, solving the matching result that this energy matrix obtains single prediction; Integrated unit, for carrying out integrated to the matching result repeatedly solved, obtaining piece identity's corresponding relation and determining to have the identity homogeneity of same person name personage.
According to one embodiment of present invention, further, described training set generation unit, comprising: node sets up submodule to feature, for for node V (i) any given in 2 networks P, Q, setting up its network topology structure proper vector is: f (i)={ f 1, f 2... f d, wherein, node on behalf piece identity information, f [1-d]for the basic attributive character of node, comprising: node out-degree, in-degree, cluster coefficients, neighbor node, average degree, common neighbours; Set up node to proper vector vector, the node for 2 networks P, Q to proper vector vector is: F P ↔ Q = F ( V P ( i ) , V Q ( i ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( m ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( m ) } ; According to this node to proper vector vector to matched node to carrying out cluster, obtain the number with similar features node comprised in each cluster classification C, it can be used as particle characteristics, the personage with similar network behavioural characteristic as energy factors, is given equivalent energy factors to the distribution situation of feature by node; Sorter generates submodule, for building matched node to energy model: wherein, β ifor the matched node of similar features carries out the node after cluster to number, ε to set according to feature ifor the energy factors that this classification is corresponding; The energy factors corresponding to each cluster classification is obtained: ε={ ε according to described energy model 1, ε 2..., ε k, and it can be used as forecasting process interior joint to the energy factors of generic; Set up K sorter according to cluster category result, and give each node to class number.
According to one embodiment of present invention, further, described training set generation unit, also comprises: node sets up submodule to classification, for extracting its topological features respectively to the node of identity corresponding relation unknown in network P, Q: F p(i)={ f p(1), f p(2) ..., f p(m) } and F q(i)={ f q(1), f q(2) ..., f q(m) }; For the node i ∈ P of any unknown identity corresponding relation, j ∈ Q, build the matched node of n × n all unknown node to proper vector: F P ↔ Q = F ( V P ( i ) , V Q ( i ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( n ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( n ) } ; By K sorter, matched node is classified to proper vector, obtain each node to class label, build node to classification matrix; Described matching unit, also for carrying out energy factors filling to classification matrix, replaces with the energy factors ε corresponding to this classification by class label in classification matrix i=catogory, build energy matrix, calculate the optimum matching of energy matrix.
According to one embodiment of present invention, further, described matching unit calculates the algorithm of the optimum matching of this energy matrix and is:
min Σ i = 1 n Σ j = 1 n E ij λ ij ;
s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n ) ;
Σ j = 1 n λ ij = 1 ( j = 1 , 2 , . . . , n ) ,
λ ij∈{0,1};
Wherein, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, if corresponding relation be established, be labeled as 1, otherwise be labeled as 0, matching result is expressed as
According to one embodiment of present invention, further, described integrated unit, also to predict the outcome with obtaining ξ, to vote, obtain ballot matrix V-Matrix=(V by predicting the outcome at every turn in node is to coupling matrix ij); Solve the Optimum Matching problem of this ballot matrix V-Matrix, the formula of employing is:
max Σ i = 1 n Σ j = 1 n v ij λ ij ;
s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n ) ;
Σ j = 1 n λ ij = 1 ( j = 1 , 2 , . . . , n ) ;
λ ij∈{0,1};
Wherein, v ijrepresent the voting results of the i-th row jth row in ballot matrix, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, namely represent the final matching results that node is right.
As can be seen from above, the searching method of people information of the present invention and device, different identity information can will be had but the identical information of name carries out homogeneity confirmation in multiple network, different identity information can be confirmed but whether the identical people of name is same person, the accuracy of statistics can be improved, and, the algorithm adopted is efficient, computation process is very fast, and along with the increase of Sample Storehouse, result of calculation accuracy rate also can improve constantly.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of an embodiment of the searching method of people information of the present invention;
Fig. 2 is the process flow diagram of another embodiment of the searching method of people information of the present invention;
The two-tier network arbitrary node of the searching method of Fig. 3 people information of the present invention is to classification matrix schematic diagram;
Fig. 4 is that in the energy matrix of the searching method of people information of the present invention, energy factors fills schematic diagram;
The schematic diagram of ballot matrix when Fig. 5 is ξ=2 of the searching method of people information of the present invention;
Fig. 6 is the ballot of the searching method of people information of the present invention and the schematic diagram of Integrated Algorithm process;
Fig. 7 is the schematic diagram of an embodiment of the searcher of people information of the present invention.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly understand, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.
Fig. 1 is the process flow diagram of an embodiment of the searching method of people information of the present invention; As shown in Figure 1:
Step 101, obtains the piece identity's information in multiple network and piece identity's corresponding relation.
Step 102, using piece identity's information aggregate of known piece identity's corresponding relation as training set.
Step 103, according to the minimum energy model of the described piece identity's information architecture in described training set based on personage's behavior similarity, obtains energy factors and matching relationship sorter.
Any two personage's identity informations are mated according to described matching relationship sorter by step 104, and adopt energy factors to carry out energy filling forming energy matrix, solve the matching result that this energy matrix obtains single prediction.
Step 105, carries out integrated to the matching result repeatedly solved, and obtains piece identity's corresponding relation and determines to have the identity homogeneity of same person name personage.
Step 106, receives the name of input, and obtains the webpage with described name; Duplicate removal process is carried out to webpage, and extract the piece identity information corresponding with described name and piece identity's corresponding relation from described webpage; After determining the identity homogeneity of the personage with identical name, in a webpage, show the identity information of same personage in heterogeneous networks, wherein, identity information comprises: Email, phone, unit.
Determine that the key of the Problems of Identity of individuality of the same name in Multi net voting is to find out the individual node corresponding relation in heterogeneous networks of multiple identities, i.e. network intermediate node matching problem.And mutual interactive information etc. between a large amount of individuality in the internet information records such as the topology information of heterogeneous networks, social network sites, possibility is provided for solving the internetwork coupled problem of different layers, such as, the node with same identity can be identified in heterogeneous networks to a certain extent by degree, bunch coefficient, neighbours' structure, common friends etc.
Fig. 2 is the process flow diagram of another embodiment of the searching method of people information of the present invention; As shown in Figure 2:
Step 201-208 is model formulation, using the personage of known identities corresponding relation set as training set, according to known part of nodes to the minimum energy model of information one side structure based on personage's behavior similarity, thus obtain node to corresponding energy factors, training obtains node to matching relationship sorter on the other hand, is used to guide the coupling that unknown matching relationship node is right.
Step 210-216 is node matching, carries out the coupling of any two nodes according to the sorter in model formulation process, and adopts energy factors to carry out energy filling, obtains the node of single prediction to matching result in the energy minimization process after solving filling.
Step 217-219 for ballot integrated, on the basis of repeatedly node matching process, carrying out predicting the outcome integrated, obtaining the corresponding relation of piece identity in final multitiered network, and judge the homogeneity of identical name with this.
Ising model is a kind of model describing material phase transformation.Through phase transformation, new structure and physical property be there is in material.The system undergone phase transition is generally the system had between molecule compared with strong interaction, also known as partner systems.In model formulation process, Yi Xin theoretical model principle is applied to, in the node matching process of two networks, by extracting the topological features f of nodes, set up matched node to proper vector F p-Q.
According to this proper vector to matched node to carrying out cluster, obtain the number with similar features node comprised in each cluster classification C, it can be used as particle characteristics.The personage colony with similar network behavioural characteristic as magnetic probability (energy factors), is given equivalent energy factors to the distribution situation of feature by node.According to the maximum energy criterion of spin model, suppose that system total energy value is minimum when nodes all in double-layer network are to during by entirely true coupling, and build matched node according to this to energy model:
min H = Σ i = 1 k β i ϵ i - - - ( 1 )
Wherein, β ifor the matched node of similar features carries out the node after cluster to number, ε to set according to feature ifor the energy factors that this classification is corresponding.
By nonlinear optimization method, the energy factors corresponding to each cluster classification will be obtained: ε={ ε 1, ε 2..., ε k, and it can be used as forecasting process interior joint to the energy factors of generic.Set up K-sorter (CLASSIFIER) according to above-mentioned cluster category result, and give each node to class number
In one embodiment, the prerequisite setting up energy model is that the vectorization of network node represents, for any given node V (i), defining its network topology structure proper vector is: f (i)={ f 1, f 2... f d, wherein f [1-d]may be the basic attributive character of node, such as node out-degree, in-degree, cluster coefficients, neighbor node, average degree etc., also may be point spread attributive character, such as common neighbours, Jaccard coefficient etc. between two nodes.
On this basis, the right structural eigenvector of node is then the set of base attribute characteristic sum extended attribute feature in multitiered network, and for two networks or two-tier network, then node can be expressed as vector:
F P ↔ Q = F ( V P ( i ) , V Q ( i ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( m ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( m ) } .
In one embodiment, in node matching process, for two or two-tier network, node contains the institute of two-tier network interior joint coupling likely to classification matrix, may correct matching result one to one be found then identity forecasting problem right for node to be converted to bipartite graph Optimum Matching problem situation from numerous, namely make only there is a matching result in any row and column in matrix by optimized algorithm.
And according to the minimum model of energy value, in order to make matching result global energy value minimum, first need to carry out energy factors filling to classification matrix, replace with the energy factors corresponding to this classification by class label in matrix, and build energy matrix as shown in Figure 3.In this n × n energy matrix, this method target, for finding n best matching result, meets system capacity value minimum.Available algorithm is a lot, such as Hungary Algorithm.
Hungary Algorithm is one of numerous algorithm for solving linear Task Allocation Problem, is used to the classic algorithm solving bipartite graph maximum matching problem.If G=(V, E) is a non-directed graph.As vertex set V can subregion be two mutually disjoint subset V1, V2 also, and two summits that in figure, every bar limit depends on all belong to these two different subsets, then title figure G is bipartite graph.Bipartite graph also can be designated as G=(V1, V2, E).A given bipartite graph G, in a subgraph M of G, { any two limits in E} do not depend on same summit to the limit collection of M, then claim M to be a coupling.Subset that in such subset, limit number is maximum is selected to be called the maximum matching problem (maximal matching problem) of figure.If figure's is all summita limit in all mating with certain is associated, then claim this coupling for mate completely, also referred to as complete, and perfect matching.
For 2 networks, first, to the node of identity corresponding relation unknown in network, its topological features is extracted respectively: F p(i)={ f p(1), f p(2) ..., f p(m) } and F q(i)={ f q(1), f q(2) ..., f q(m) }.
For arbitrary node i ∈ P, j ∈ Q, build n × n all possible matched node to proper vector:
F P ↔ Q = F ( V P ( i ) , V Q ( i ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( n ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( n ) }
Classified by K sorter, obtain each node to class label, thus the node built as shown in Figure 3 is to classification matrix.
Node contains the institute of two-tier network interior joint coupling likely to classification matrix, may correct matching result one to one be found then identity forecasting problem right for node to be converted to bipartite graph Optimum Matching problem situation from numerous, namely make only there is a matching result in any row and column in matrix by optimized algorithm.And according to the minimum model of energy value, in order to make matching result global energy value minimum, first need to carry out energy factors filling to classification matrix, replace with the energy factors ε corresponding to this classification by class label in matrix i=catogory, and build energy matrix as shown in Figure 4 further.
In energy matrix, target, for finding n best matching result, meets system capacity value minimum.Adopt Hungary Algorithm calculate this optimum matching, its mathematical model or algorithm as follows:
min Σ i = 1 n Σ j = 1 n E ij λ ij
s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n )
Σ j = 1 n λ ij = 1 ( j = 1 , 2 , . . . , n )
λ ij∈{0,1}
Wherein ,λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, if corresponding relation be established, be labeled as 1, otherwise be labeled as 0.Without loss of generality, matching result is expressed as
In one embodiment, in ballot integrating process, due to the randomness of clustering algorithm when choosing cluster centre, the global optimum of cluster result might not be ensured, the uncertainty that this characteristic can cause single to predict the outcome.In order to more be stablized and result accurately, the present invention introduces Integrated Algorithm, is finally predicted the outcome by ballot and Secondary Match optimization.
For given data to be predicted, the identity that first reruns corresponding relation prediction algorithm ξ time, obtains ξ and to predict the outcome, then to vote predicting the outcome at every turn in node is to coupling matrix, obtains ballot matrix V-Matrix=(V ij), as shown in Figure 5.Such as, given ξ=2, then will obtain twice matching result, if the node corresponding relation of matching result is for the first time then V is set 11, V 22, V 33and V nnvalue be 1, if second time matching result node corresponding relation be then V is set 11, V 23, V 32and V nnvalue be 1, and by V 11and V nnvalue add 1.
Again adopt the Optimum Matching problem of Hungarian Method this bipartite graph, to make system capacity value reach minimum process in forecasting process different, in Voting Algorithm, made by following for employing algorithm voting results reach overall maximum:
max Σ i = 1 n Σ j = 1 n v ij λ ij
s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n )
Σ j = 1 n λ ij = 1 ( j = 1 , 2 , . . . , n )
λ ij∈{0,1}
Wherein, v ijrepresent the voting results of the i-th row jth row in ballot matrix, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, namely represent the final matching results that node is right.After this bipartite graph Optimum Matching problem solving, obtain node as shown in Figure 6 to final matching results.
In one embodiment, adopt the True Data collection of disclosed 2 networks, be respectively Twitter and Friendfeed network, this data set comprises 155,804 personages being simultaneously registered in Twitter and Friendfeed, and comprises its piece identity's corresponding relation.Wherein, Twitter data set comprises 13, effectively pay close attention to relation record for 142,341, and Friendfeed data set comprises 5,939,687 effective friends records.
In experimentation, be training set and test set by Data Placement, and the personage's ratio arranging unknown identity corresponding relation is α, and α ∈ (0,1), such as α=5% item represents under the prerequisite of known 95% identity corresponding relation, the identity corresponding relation of prediction residue 5% personage.
Under the prerequisite of fixing cluster number K and unknown data collection ratio α, the accuracy rate of testing algorithm, and on the data set of different size the extensibility of algorithm.In the test of carried out ten examples, parameter is set to K=6 respectively, α=5% and ξ=20,100}, as shown in table 1 below:
Table 1-test figure table
Can be seen by upper table, multitiered network node identities Forecasting Methodology proposed by the invention Average Accuracy on this True Data collection, more than 90%, and has consistance result for the data set of different size, shows that the method has good extensibility.
As shown in Figure 7, the invention provides a kind of searcher 4 of people information, comprising: information acquisition unit 41, training set generation unit 42, matching unit 43, integrated unit 44.Information acquisition unit 41 obtains piece identity's information in multiple network and piece identity's corresponding relation.
Training set generation unit 42 using piece identity's information aggregate of known piece identity's corresponding relation as training set; According to the minimum energy model of the described piece identity's information architecture in described training set based on personage's behavior similarity, obtain energy factors and matching relationship sorter.
Any two personage's identity informations mate according to described matching relationship sorter by matching unit 43, and adopt energy factors to carry out energy filling forming energy matrix, solve the matching result that this energy matrix obtains single prediction.Integrated unit 44 carries out integrated to the matching result repeatedly solved, and obtains piece identity's corresponding relation and determines to have the identity homogeneity of same person name personage.
Search unit 45 receives the name of input, obtain from multiple network the piece identity information corresponding with described name with; According to the identity homogeneity of the personage of identical name, in a webpage, show the identity information of same personage in heterogeneous networks; Identity information comprises: Email, phone, unit etc.
In one embodiment, training set generation unit 42, comprising: node sets up submodule to feature, and for node V (i) any given in 2 networks P, Q, setting up its network topology structure proper vector is: f (i)={ f 1, f 2... f d, wherein, node on behalf piece identity information, f [1-d]for the basic attributive character of node, comprising: node out-degree, in-degree, cluster coefficients, neighbor node, average degree, common neighbours; Set up node to proper vector vector, the node for 2 networks P, Q to proper vector vector is: F P ↔ Q = F ( V P ( i ) , V Q ( i ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( m ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( m ) } ; According to this node to proper vector vector to matched node to carrying out cluster, obtain the number with similar features node comprised in each cluster classification C, it can be used as particle characteristics, the personage with similar network behavioural characteristic as energy factors, is given equivalent energy factors to the distribution situation of feature by node.
Training set generation unit 42 comprises: sorter generates submodule, builds matched node to energy model: β ifor the matched node of similar features carries out the node after cluster to number, ε to set according to feature ifor the energy factors that this classification is corresponding; The energy factors corresponding to each cluster classification is obtained: ε={ ε according to described energy model 1, ε 2..., ε k, and it can be used as forecasting process interior joint to the energy factors of generic; Set up K sorter according to cluster category result, and give each node to class number.
Described training set generation unit 42 also comprises: node sets up submodule to classification, extracts its topological features respectively: F to the node of identity corresponding relation unknown in network P, Q p(i)={ f p(1), f p(2) ..., f p(m) } and F q(i)={ f q(1), f q(2) ..., f q(m) }; For the node i ∈ P of any unknown identity corresponding relation, j ∈ Q, build the matched node of n × n all unknown node to proper vector: F P ↔ Q = F ( V P ( i ) , V Q ( i ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( n ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( n ) } ; By K sorter, matched node is classified to proper vector, obtain each node to class label, build node to classification matrix.
Described matching unit 43 pairs of classification matrixes carry out energy factors filling, and class label in classification matrix is replaced with the energy factors ε corresponding to this classification i=catogory, build energy matrix, calculate the optimum matching of energy matrix.
The algorithm that described matching unit 43 calculates the optimum matching of this energy matrix is:
min Σ i = 1 n Σ j = 1 n E ij λ ij ;
s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n ) ;
Σ j = 1 n λ ij = 1 ( j = 1 , 2 , . . . , n ) ,
λ ij∈{0,1};
Wherein, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, if corresponding relation be established, be labeled as 1, otherwise be labeled as 0, matching result is expressed as
Described integrated unit 44 obtains ξ and to predict the outcome, and to vote, obtain ballot matrix V-Matrix=(V by predicting the outcome at every turn in node is to coupling matrix ij); Solve the Optimum Matching problem of this ballot matrix V-Matrix, the formula of employing is:
max Σ i = 1 n Σ j = 1 n v ij λ ij ;
s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n ) ;
Σ j = 1 n λ ij = 1 ( j = 1 , 2 , . . . , n ) ;
λ ij∈{0,1};
Wherein, v ijrepresent the voting results of the i-th row jth row in ballot matrix, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, namely represent the final matching results that node is right.
The searching method of people information of the present invention and device, different identity information can will be had but the identical information of name carries out homogeneity confirmation in multiple network, different identity information can be confirmed but whether the identical people of name is same person, the accuracy of statistics can be improved.The algorithm adopted is efficient, and computation process is very fast, and along with the increase of Sample Storehouse, result of calculation accuracy rate also can improve constantly.
The memory management method in the intelligent meter storehouse that above-described embodiment provides and system, solve according to the suitable storage policy of different warehouse Foreground selection by the storage policy optimized, not only effectively can utilize storage space, improve warehouse execution efficiency, reduce operating cost, also can bring a lot of benefit for whole intelligent meter storehouse in management simultaneously.
Those of ordinary skill in the field are to be understood that: the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. a searching method for people information, is characterized in that, comprising:
Obtain the piece identity's information in multiple network and piece identity's corresponding relation;
Using piece identity's information aggregate of known piece identity's corresponding relation as training set;
According to the minimum energy model of the described piece identity's information architecture in described training set based on personage's behavior similarity, obtain energy factors and matching relationship sorter;
According to described matching relationship sorter, any two personage's identity informations are mated, and adopt energy factors to carry out energy filling forming energy matrix, solve the matching result that this energy matrix obtains single prediction;
Carry out integrated to the matching result repeatedly solved, obtain piece identity's corresponding relation and determine to have the identity homogeneity of same person name personage;
Receive the name of input, from multiple network, obtain the piece identity information corresponding with described name;
According to the identity homogeneity of the personage of identical name, in a webpage, show the identity information of same personage in heterogeneous networks, wherein, identity information comprises: Email, phone, unit.
2. the method for claim 1, it is characterized in that, described using the set of described piece identity's information of known piece identity's corresponding relation as training set, according to the described piece identity's information architecture in described training set based on personage's behavior similarity minimum energy model, obtain energy factors and matching relationship sorter and comprise:
For node V (i) any given in 2 networks P, Q, its network topology structure proper vector is: f (i)={ f 1, f 2... f d, wherein, node on behalf piece identity information, f [1-d]for the basic attributive character of node, comprising: node out-degree, in-degree, cluster coefficients, neighbor node, average degree, common neighbours;
Set up node to proper vector vector, the node for 2 networks P, Q to proper vector vector is:
F P ↔ Q = F ( V P ( i ) , V Q ( i ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( m ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( m ) } ;
According to this node to proper vector vector to matched node to carrying out cluster, obtain the number with similar features node comprised in each cluster classification C, it can be used as particle characteristics, the personage with similar network behavioural characteristic as energy factors, is given equivalent energy factors to the distribution situation of feature by node;
Build matched node to energy model: wherein, β ifor the matched node of similar features carries out the node after cluster to number, ε to set according to feature ifor the energy factors that this classification is corresponding;
The energy factors corresponding to each cluster classification is obtained: ε={ ε according to described energy model 1, ε 2..., ε k, and it can be used as forecasting process interior joint to the energy factors of generic;
Set up K sorter according to cluster category result, and give each node to class number.
3. the method for claim 1, it is characterized in that, described according to described matching relationship sorter any two personage's identity informations are carried out mating and adopt energy factors to carry out energy fill forming energy matrix, solve the matching result that this energy matrix obtains single prediction and comprise:
Its topological features is extracted respectively to the node of identity corresponding relation unknown in network P, Q: F p(i)={ f p(1), f p(2) ..., f p(m) } and F q(i)={ f q(1), f q(2) ..., f q(m) };
For the node i ∈ P of any unknown identity corresponding relation, j ∈ Q, build the matched node of n × n all unknown node to proper vector:
F P ↔ Q = F ( V P ( i ) , V Q ( j ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( n ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( n ) } ;
By K sorter, matched node is classified to proper vector, obtain each node to class label, build node to classification matrix;
Energy factors filling is carried out to classification matrix, class label in classification matrix is replaced with the energy factors ε corresponding to this classification i=catogory, build energy matrix;
Calculate the optimum matching of energy matrix.
4. method as claimed in claim 3, is characterized in that,
The algorithm calculating the optimum matching of this energy matrix is:
min Σ i = 1 n Σ j = 1 n E ij λ ij ;
s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n ) ;
Σ j = 1 n λ ij = 1 ( j = 1,2 , . . . , n ) ,
λ ij∈{0,1};
Wherein, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, if corresponding relation be established, be labeled as 1, otherwise be labeled as 0, matching result is expressed as V P ( i ) ↔ V Q ( j ) .
5. the method as described in claim 3 or 4, is characterized in that, the described matching result to repeatedly solving carries out integrated, obtains the corresponding relation of piece identity's information and determines that the identity homogeneity with identical name comprises:
Obtain ξ to predict the outcome, to vote predicting the outcome at every turn in node is to coupling matrix, obtain ballot matrix V-Matrix=(V ij);
Solve the Optimum Matching problem of this ballot matrix V-Matrix, the algorithm of employing is:
max Σ i = 1 n Σ j = 1 n v ij λ ij ;
s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n ) ;
Σ j = 1 n λ ij = 1 ( j = 1,2 , . . . , n ) ;
λ ij∈{0,1};
Wherein, v ijrepresent the voting results of the i-th row jth row in ballot matrix, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, namely represent the final matching results that node is right.
6. a searcher for people information, is characterized in that, comprising:
Information acquisition unit, for obtaining piece identity's information in multiple network and piece identity's corresponding relation;
Training set generation unit, for using piece identity's information aggregate of known piece identity's corresponding relation as training set; According to the minimum energy model of the described piece identity's information architecture in described training set based on personage's behavior similarity, obtain energy factors and matching relationship sorter;
Matching unit, for being mated by any two personage's identity informations according to described matching relationship sorter, and adopting energy factors to carry out energy filling forming energy matrix, solving the matching result that this energy matrix obtains single prediction;
Integrated unit, for carrying out integrated to the matching result repeatedly solved, obtaining piece identity's corresponding relation and determining to have the identity homogeneity of same person name personage.
Search unit, for receiving the name of input, obtain from multiple network the piece identity information corresponding with described name with; According to the identity homogeneity of the personage of identical name, in a webpage, show the identity information of same personage in heterogeneous networks;
Wherein, identity information comprises: Email, phone, unit.
7. device as claimed in claim 6, is characterized in that:
Described training set generation unit, comprising:
Node sets up submodule to feature, and for for node V (i) any given in 2 networks P, Q, setting up its network topology structure proper vector is: f (i)={ f 1, f 2... f d, wherein, node on behalf piece identity information, f [1-d]for the basic attributive character of node, comprising: node out-degree, in-degree, cluster coefficients, neighbor node, average degree, common neighbours; Set up node to proper vector vector, the node for 2 networks P, Q to proper vector vector is: F P ↔ Q = F ( V P ( i ) , V Q ( i ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( m ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( m ) } ; According to this node to proper vector vector to matched node to carrying out cluster, obtain the number with similar features node comprised in each cluster classification C, it can be used as particle characteristics, the personage with similar network behavioural characteristic as energy factors, is given equivalent energy factors to the distribution situation of feature by node;
Sorter generates submodule, for building matched node to energy model: wherein, β ifor the matched node of similar features carries out the node after cluster to number, ε to set according to feature ifor the energy factors that this classification is corresponding; The energy factors corresponding to each cluster classification is obtained: ε={ ε according to described energy model 1, ε 2..., ε k, and it can be used as forecasting process interior joint to the energy factors of generic; Set up K sorter according to cluster category result, and give each node to class number.
8. device as claimed in claim 6, is characterized in that:
Described training set generation unit, also comprises:
Node sets up submodule to classification, for extracting its topological features respectively to the node of identity corresponding relation unknown in network P, Q: F p(i)={ f p(1), f p(2) ..., f p(m) } and F q(i)={ f q(1), f q(2) ..., f q(m) }; For the node i ∈ P of any unknown identity corresponding relation, j ∈ Q, build the matched node of n × n all unknown node to proper vector: F P ↔ Q = F ( V P ( i ) , V Q ( j ) ) = { f P ( 1 ) , f P ( 2 ) , . . . , f P ( n ) , f Q ( 1 ) , f Q ( 2 ) , . . . , f Q ( n ) } ; By K sorter, matched node is classified to proper vector, obtain each node to class label, build node to classification matrix;
Described matching unit, also for carrying out energy factors filling to classification matrix, replaces with the energy factors ε corresponding to this classification by class label in classification matrix i=catogory, build energy matrix, calculate the optimum matching of energy matrix.
9. device as claimed in claim 8, is characterized in that,
The algorithm that described matching unit calculates the optimum matching of this energy matrix is:
min Σ i = 1 n Σ j = 1 n E ij λ ij ;
s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n ) ;
Σ j = 1 n λ ij = 1 ( j = 1,2 , . . . , n ) ,
λ ij∈{0,1};
Wherein, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, if corresponding relation be established, be labeled as 1, otherwise be labeled as 0, matching result is expressed as V P ( i ) ↔ V Q ( j ) .
10. device as claimed in claim 8 or 9, is characterized in that:
Described integrated unit, also to predict the outcome with obtaining ξ, to vote, obtain ballot matrix V-Matrix=(V by predicting the outcome at every turn in node is to coupling matrix ij); Solve the Optimum Matching problem of this ballot matrix V-Matrix, the formula of employing is:
max Σ i = 1 n Σ j = 1 n v ij λ ij ;
s . t . Σ i = 1 n λ ij = 1 ( i = 1,2 , . . . , n ) ;
Σ j = 1 n λ ij = 1 ( j = 1,2 , . . . , n ) ;
λ ij∈{0,1};
Wherein, v ijrepresent the voting results of the i-th row jth row in ballot matrix, λ ijrepresent whether the node i in network P and the node j in network G exist one-to-one relationship, namely represent the final matching results that node is right.
CN201410720437.2A 2014-12-01 2014-12-01 Search method and device for figure information Pending CN104376116A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410720437.2A CN104376116A (en) 2014-12-01 2014-12-01 Search method and device for figure information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410720437.2A CN104376116A (en) 2014-12-01 2014-12-01 Search method and device for figure information

Publications (1)

Publication Number Publication Date
CN104376116A true CN104376116A (en) 2015-02-25

Family

ID=52555023

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410720437.2A Pending CN104376116A (en) 2014-12-01 2014-12-01 Search method and device for figure information

Country Status (1)

Country Link
CN (1) CN104376116A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106372938A (en) * 2015-07-21 2017-02-01 华为技术有限公司 Abnormal account identification method and system
CN106817251A (en) * 2016-12-23 2017-06-09 烟台中科网络技术研究所 A kind of link prediction method and device based on node similarity
CN107908749A (en) * 2017-11-17 2018-04-13 哈尔滨工业大学(威海) A kind of personage's searching system and method based on search engine
WO2020118584A1 (en) * 2018-12-12 2020-06-18 Microsoft Technology Licensing, Llc Automatically generating training data sets for object recognition
CN115866292A (en) * 2021-08-05 2023-03-28 聚好看科技股份有限公司 Server, display device and screenshot recognition method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120041979A1 (en) * 2010-08-12 2012-02-16 Industry-Academic Cooperation Foundation, Yonsei University Method for generating context hierarchy and system for generating context hierarchy
CN102651030A (en) * 2012-04-09 2012-08-29 华中科技大学 Social network association searching method based on graphics processing unit (GPU) multiple sequence alignment algorithm

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120041979A1 (en) * 2010-08-12 2012-02-16 Industry-Academic Cooperation Foundation, Yonsei University Method for generating context hierarchy and system for generating context hierarchy
CN102651030A (en) * 2012-04-09 2012-08-29 华中科技大学 Social network association searching method based on graphics processing unit (GPU) multiple sequence alignment algorithm

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
夏虎: "移动社交网络结构和行为研究及其应用", 《中国博士学位论文全文数据库 信息科技辑》 *
邢志宇: "人物信息的网络检索途径与方法", 《河南图书馆学刊》 *
郎君 等: "基于社会网络的人名检索结果重名消解", 《计算机学报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106372938A (en) * 2015-07-21 2017-02-01 华为技术有限公司 Abnormal account identification method and system
CN106817251A (en) * 2016-12-23 2017-06-09 烟台中科网络技术研究所 A kind of link prediction method and device based on node similarity
CN106817251B (en) * 2016-12-23 2020-05-19 烟台中科网络技术研究所 Link prediction method and device based on node similarity
CN107908749A (en) * 2017-11-17 2018-04-13 哈尔滨工业大学(威海) A kind of personage's searching system and method based on search engine
WO2020118584A1 (en) * 2018-12-12 2020-06-18 Microsoft Technology Licensing, Llc Automatically generating training data sets for object recognition
CN115866292A (en) * 2021-08-05 2023-03-28 聚好看科技股份有限公司 Server, display device and screenshot recognition method

Similar Documents

Publication Publication Date Title
CN104462318A (en) Identity recognition method and device of identical names in multiple networks
Xiang et al. Modeling relationship strength in online social networks
CN111160954B (en) Recommendation method facing group object based on graph convolution network model
CN104376116A (en) Search method and device for figure information
Zhou et al. Movie recommendation system employing the user-based cf in cloud computing
CN110674407A (en) Hybrid recommendation method based on graph convolution neural network
Rui et al. A reversed node ranking approach for influence maximization in social networks
Pan et al. Clustering of designers based on building information modeling event logs
CN113255895B (en) Structure diagram alignment method and multi-diagram joint data mining method based on diagram neural network representation learning
CN104298778A (en) Method and system for predicting quality of rolled steel product based on association rule tree
CN106844665A (en) A kind of paper based on the distributed expression of adduction relationship recommends method
CN105678590A (en) topN recommendation method for social network based on cloud model
He et al. Weighted meta paths and networking embedding for patent technology trade recommendations among subjects
CN106980639B (en) Short text data aggregation system and method
CN104035978B (en) Combo discovering method and system
CN106649380A (en) Hot spot recommendation method and system based on tag
Han et al. DeepRouting: A deep neural network approach for ticket routing in expert network
CN103034728B (en) Social network academic resources interaction platform is utilized to carry out the method for information interaction
Liu et al. A hybrid book recommendation algorithm based on context awareness and social network
Wang et al. Community discovery algorithm of complex network attention model
CN104240026B (en) Product-design knowledge management service matching process
Duarte et al. Dealing with missing information in data envelopment analysis by means of low-rank matrix completion
Lenzen Aggregating input–output systems with minimum error
CN104462480B (en) Comment big data method for digging based on typicalness
CN104462321A (en) Scientific and technological achievement popularization method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C53 Correction of patent of invention or patent application
CB03 Change of inventor or designer information

Inventor after: Wang Jinghua

Inventor after: Chen Xi

Inventor after: Xu Huiming

Inventor after: Guo Guang

Inventor after: Xie Naibo

Inventor after: Wei Minglei

Inventor before: Wang Jinghua

Inventor before: Chen Xi

Inventor before: Guo Guang

Inventor before: Xie Naibo

Inventor before: Wei Minglei

COR Change of bibliographic data

Free format text: CORRECT: INVENTOR; FROM: WANG JINGHUA CHEN XI GUO GUANG XIE NAIBO WEI MINGLEI TO: WANG JINGHUA CHEN XI XU HUIMING GUO GUANG XIE NAIBO WEI MINGLEI

C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150225

RJ01 Rejection of invention patent application after publication