CN105488247A - K-mean community structure mining method and apparatus - Google Patents

K-mean community structure mining method and apparatus Download PDF

Info

Publication number
CN105488247A
CN105488247A CN201510784716.XA CN201510784716A CN105488247A CN 105488247 A CN105488247 A CN 105488247A CN 201510784716 A CN201510784716 A CN 201510784716A CN 105488247 A CN105488247 A CN 105488247A
Authority
CN
China
Prior art keywords
node
coordinate
seed
corporations
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510784716.XA
Other languages
Chinese (zh)
Inventor
范科峰
李琳
姚相振
周睿康
樊晓贺
叶润国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Electronics Standardization Institute
Original Assignee
China Electronics Standardization Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Electronics Standardization Institute filed Critical China Electronics Standardization Institute
Priority to CN201510784716.XA priority Critical patent/CN105488247A/en
Publication of CN105488247A publication Critical patent/CN105488247A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/10Geometric CAD
    • G06F30/18Network design, e.g. design based on topological or interconnect aspects of utility systems, piping, heating ventilation air conditioning [HVAC] or cabling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Geometry (AREA)
  • Mathematical Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mathematical Optimization (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a K-mean community structure mining method and apparatus. According to the mining method and apparatus, a clustering center is reasonably selected as an initial seed node in a K-mean clustering algorithm, and finally network nodes belonging to a same community are clustered in a low-dimensional Euclidean space by utilizing the K-mean algorithm, so that the community mining precision is improved and the accuracy of subsequent system analysis is enhanced.

Description

A kind of K average community structure method for digging and device
Technical field
The present invention relates to networking technology area, particularly relate to a kind of K average community structure method for digging and device.
Background technology
Many complication systems in real world can be conceptualized as by the complex network connecting limit between node and node and form.Such as, the webpage in Internet network can regard the node in complex network as, the hyperlink between the page can be counted as internodal connect limit, thus by abstract for whole Internet be complex network; In online social networks, each Virtual User can be conceptualized as network node, and the concern between Virtual User, mutually plusing good friend etc. operate the company limit that can be regarded as between network, and on this basis, whole online social networks can represent with complex network; In country's public traffic network, using the website between each city as network node, the traffic route between city, as connecting limit, can obtain the abstract network figure of a national public traffic network; Different bioprotein is also disclosed influencing each other between different albumen with connecting limit as network node by bioprotein network.Therefore, as the effective tool of research complication system, the various character of complex network have caused the extensive concern of all circles scholar.
Many scientist's research shows, complex network has all multiple topology character.Wherein, community structure is the important topological property of of complex network.In complex network, the node in complex network is divided into several subsets by community structure, makes to connect the company limit of limit comparatively closely between corporations between the node of corporations inside then comparatively sparse.The network structure of this high cohesion can well disclose design feature in complication system, functional characteristic and tissue signature.Community structure such as in internet has reacted the number of site that topics common is discussed, and the community structure in online social networks then illustrates a group of the people's composition having common interest hobby.Therefore, in complex network, the excavation of community structure is of great practical significance for the characteristic and function analyzing network.In recent years, in the face of increasing complication system is conceptualized as complex network, the community structure in high-precision excavation complex network had great importance for the physical characteristics of Analysis of Complex system.But current excavation corporations precision is not high, thus have impact on the follow-up accuracy to systematic analysis.
Summary of the invention
In view of above-mentioned analysis, the present invention aims to provide a kind of K average community structure method for digging and device, excavates the not high problem of corporations' precision in order to solve in prior art.
The present invention is mainly achieved through the following technical solutions:
One aspect of the present invention provides a kind of K average community structure method for digging, and the method comprises:
Seed screening method based on local expansion selects K initial seed corporations, and the Centroid calculating each initial seed corporations is as initial seed node, described Centroid coordinate as the coordinate figure of K mean cluster seed node, wherein, K ∈ [2, N-1];
The coordinate of described initial seed node is carried out K mean cluster as input parameter, and the community structure obtained under current K value divides;
The community structure obtained under different K values divides, the module angle value of corporations' division result under calculating different parameters value K, choose module angle value can be made maximum initial seed node clustering result as optimum division result, using belonging to of a sort initial seed node in optimum division result as a community structure, obtain final division result.
Preferably, the method also comprises:
According to the adjacency matrix A of complex network G calculate each network node between identical immediate neighbor node number and form similarity matrix S, according to the Distance matrix D between network node each in similarity matrix computational grid;
Wherein, described adjacency matrix A is for having N number of node v icomplex network figure G, structure a N × N matrix, as node v iwith node v jbetween when having limit to be connected, a ij=1; Without when directly connecting limit between node i and j, a ij=0, wherein, a ijfor each element in adjacency matrix A, concrete expression is internodal connects frontier juncture system, i=1,2 ..., N, j=1,2 ..., N; As i=j, a ij=0; Described similarity matrix S is for having N number of node v icomplex network figure G, structure a N × N matrix, as node v iwith node v jbetween when having limit to be connected, s ij=(A i, A j); Without when directly connecting limit between node i and j, s ij=0, wherein s ijfor each element in S, i=1,2 ..., N, j=1,2 ..., N; As i=j, s ij=0; A ithe vector of the i-th row element composition of representing matrix A, described from matrix D for having N number of node V icomplex network figure G, structure a N × N matrix, d ij=(s ii+ s jj-2s ij) 1/2, wherein, s ijfor each element in S, i=1,2 ..., N, j=1,2 ..., N;
Utilize multidimensional scaling MDS that each network node is mapped as the node coordinate in low-dimensional Euclidean space; Described multidimensional scaling MDS specifically comprises: utilize D=U Λ U tmatrix D is decomposed, wherein, Λ=diag{ λ 1, λ 2..., λ nbe a diagonal matrix, and each element λ ithe eigenwert of representing matrix D, without loss of generality, makes λ 1>=λ 2>=...>=λ n, by eigenwert by descending sort, i-th in matrix U is classified as eigenvalue λ icharacteristic of correspondence vector, if at λ 1>=λ 2>=...>=λ nin, there is p eigenwert and be greater than zero, p ∈ [1, N], then select p eigenwert characteristic of correspondence vector to be designated as u 1, u 2..., u p, and formed the row of matrix P, get the coordinate of N number of node in p dimension space in N number of row vector map network of matrix P, i.e. (x 1, x 2..., x n);
The described seed screening method based on local expansion selects K initial seed corporations, and the Centroid calculating each initial seed corporations specifically comprises as initial seed node:
Seed screening method based on local expansion selects K initial seed corporations, and calculates the initial seed node of centre coordinate as each initial seed corporations of each initial seed corporations according to the Euclidean coordinate of network node.
Preferably, the described seed screening method based on local expansion selects K initial seed corporations, and the centre coordinate calculating each initial seed corporations according to the Euclidean coordinate of network node specifically comprises as the initial seed node of each initial seed corporations:
Each node in complex network G is all arranged a zone bit δ i(i=1,2 ..., N), 0 represents that this node is not identified, and 1 represents that this node is identified;
Search all complete subgraphs in this network, according to calculate the mean value of weights in each complete subgraph found, and according to order sequence from big to small, use G 1>=G 2>=...>=G mrepresent;
Be the node selecting the number of degrees maximum in the node of 0 at present node zone bit, if this node is a, at G 1, G 2..., G mmiddle selection comprises the complete subgraph composition subset of node a r be from 1 to M random natural number, and in this subset, choose the complete subgraph G of eligible (1) and condition (2) i, using its center node coordinate as an initial seed node:
Condition (1): G iin the zone bit of all nodes be 0;
Condition (2): G i>=G j, for arbitrarily
Initialization Node subsets with the initial seed corporations G selected ias core, according to expansion G iscope, wherein, α and β is the real number of span between 0 to 1, for G iimmediate neighbor node v, if v is added subgraph G iafter subgraph G i+ v represents, according to formula the subgraph obtained connects limit tight ness rating if then node v is added subgraph G i, otherwise, node v is not added subgraph G iin, subgraph G can be made by every icompany's limit density the neighbor node increased all puts into Node subsets Ω, until formula till no longer increasing, and the zone bit of each node in set omega is all set to 1;
If initialization is complete all in K seed corporations, or G 1, G 2..., G min no longer there is the complete subgraph be not labeled, and if the current seed node number θ found is less than K, then adopt phase mutual edge distance farthest principle select remaining seed node;
Described phase mutual edge distance farthest principle specifically comprises: establish the current coordinate having found θ ' (θ ' ∈ [θ+1, K]) individual seed node to be y 1, y 2..., y θ ', the coordinate of the node in network except seed node is o 1, o 2..., o n-θ ', then the phase mutual edge distance coordinate of residue seed node o that principle is selected farthest is adopted to be except y in network 1, y 2..., y θ 'outside node coordinate in, meet following formula node, o ' ∈ { o 1, o 2..., o n-θ ', in a calculating K seed corporation each node mapped by MDS after centre coordinate, as the coordinate { x of the initial seed node of K average 1', x 2' ..., x ' k;
The account form of described each initial seed corporations centre coordinate is: for the seed corporations G found i, i=1,2 ..., K, if it comprises g node, each node coordinate is y 1, y 2..., y g, then seed corporations centre coordinate is
Preferably, the described coordinate using described initial seed node carries out K mean cluster as input parameter, and the community structure obtained under current K value divides and specifically comprises:
Step (1), by (x 1', x 2' ..., x ' k) as initial cluster center, by K set Z ibe set to empty set, wherein i=1,2 ..., K, is set to empty set;
Step (2), circulation (3) to (4) is until each cluster no longer changes;
Step (3), in the t time iteration, any node coordinate x in network " by following method it is adjusted in a certain classification in K classification and go, for a certain classification i, wherein i=1; 2 ..., K, if all j ≠ i, wherein; j=1,2 ..., K, if || x "-x i' || < || x "-x ' j||, then x " ∈ Z i, wherein, t is the natural number from 1, || a-b|| is the Euclidean distance between coordinate a and b, if a and b is p dimensional vector;
Step (4), is calculated a jth component of the new centre coordinate of the i-th class by (3) step in formula, | Z i| be Z ithe number of element in class.Then the centre coordinate of the i-th class is: (x ' i1, x ' i2..., x ' iK).The value calculating J ' according to the centre coordinate newly obtained is: if | J '-J| < δ, then quit a program, and exports cluster result Z i(i=1,2 ..., K), otherwise, make J=J ', go to step (3).
Preferably, the community structure of described acquisition under different K values divides, the module angle value of corporations' division result under calculating different parameters value K, choose module angle value can be made maximum initial seed node clustering result as optimum division result, of a sort initial seed node will be belonged in optimum division result as a community structure, obtains final division result and specifically comprise:
Corporations are carried out to complex network figure G and divides C, note C={c 1, c 2..., c p, wherein, p is to the random natural number N from 1, c i(i=1,2 ..., be p) set that in complex network figure G, some nodes form, calculate the modularity under the division of current corporations in formula, for company's limit number of other nodes in network node vi and network, wherein, i=1,2 ..., N, M are the number that network connects limit, c mand c nbe respectively node v iand v jthe numbering of affiliated corporations, wherein, m ∈ [1, p], n ∈ [1, p],
The present invention additionally provides a kind of K average community structure excavating gear on the other hand, and this device comprises:
Computing unit, select K initial seed corporations, and the Centroid calculating each initial seed corporations is as initial seed node for the seed screening method based on local expansion, described Centroid coordinate is as the coordinate figure of K mean cluster seed node, wherein, K ∈ [2, N-1];
Acquiring unit, for the coordinate of described initial seed node is carried out K mean cluster as input parameter, the community structure obtained under current K value divides;
Processing unit, divide for the community structure obtained under different K values, the module angle value of corporations' division result under calculating different parameters value K, choose module angle value can be made maximum initial seed node clustering result as optimum division result, using belonging to of a sort initial seed node in optimum division result as a community structure, obtain final division result.
Preferably, this device also comprises: map unit;
Described map unit, for calculate according to the adjacency matrix A of complex network G each network node between identical immediate neighbor node number and form similarity matrix S, according to the Distance matrix D between network node each in similarity matrix computational grid; Wherein, described adjacency matrix A is for having N number of node v icomplex network figure G, structure a N × N matrix, as node v iwith node v jbetween when having limit to be connected, a ij=1; Without when directly connecting limit between node i and j, a ij=0, wherein, a ijfor each element in adjacency matrix A, concrete expression is internodal connects frontier juncture system, i=1,2 ..., N, j=1,2 ..., N; As i=j, a ij=0; Described similarity matrix S is for having N number of node v icomplex network figure G, structure a N × N matrix, as node v iwith node v ibetween when having limit to be connected, s ij=(A i, A j); Without when directly connecting limit between node i and j, s ij=0, wherein s ijfor each element in S, i=1,2 ..., N, j=1,2 ..., N; As i=j, s ij=0; A ithe vector of the i-th row element composition of representing matrix A, described from matrix D for having N number of node v icomplex network figure G, structure a N × N matrix, d ij=(s ii+ s jj-2s ij) 1/2, wherein, s ijfor each element in S, i=1,2 ..., N, j=1,2 ..., N; Utilize multidimensional scaling MDS that each network node is mapped as the node coordinate in low-dimensional Euclidean space; Described multidimensional scaling MDS specifically comprises: utilize D=U Λ U tmatrix D is decomposed, wherein, Λ=diag{ λ 1, λ 2..., λ nbe a diagonal matrix, and each element λ ithe eigenwert of representing matrix D, without loss of generality, makes λ 1>=λ 2>=...>=λ n, by eigenwert by descending sort, i-th in matrix U is classified as eigenvalue λ icharacteristic of correspondence vector, if at λ 1>=λ 2>=...>=λ nin, there is p eigenwert and be greater than zero, p ∈ [1, N], then select p eigenwert characteristic of correspondence vector to be designated as u 1, u 2..., u p, and formed the row of matrix P, get the coordinate of N number of node in p dimension space in N number of row vector map network of matrix P, i.e. (x 1, x 2..., x n);
Described computing unit specifically for, the seed screening method based on local expansion selects K initial seed corporations, and calculates the initial seed node of centre coordinate as each initial seed corporations of each initial seed corporations according to the Euclidean coordinate of network node.
Preferably, described computing unit specifically for, each node in complex network G is all arranged a zone bit δ i(i=1,2 ..., N), 0 represents that this node is not identified, and 1 represents that this node is identified; Search all complete subgraphs in this network, according to calculate the mean value of weights in each complete subgraph found, and according to order sequence from big to small, use G 1>=G 2>=...>=G mrepresent; Be the node selecting the number of degrees maximum in the node of 0 at present node zone bit, if this node is a, at G 1, G 2..., G mmiddle selection comprises the complete subgraph composition subset of node a r be from 1 to M random natural number, and in this subset, choose the complete subgraph G of eligible (1) and condition (2) i, using its center node coordinate as an initial seed node: condition (1): G iin the zone bit of all nodes be 0; Condition (2): G i>=G j, for arbitrarily initialization Node subsets with the initial seed corporations G selected ias core, according to expansion G iscope, wherein, α and β is the real number of span between 0 to 1, for G iimmediate neighbor node v, if v is added subgraph G iafter subgraph G i+ v represents, according to formula the subgraph obtained connects limit tight ness rating if then node v is added subgraph G i, otherwise, node v is not added subgraph G iin, subgraph G can be made by every icompany's limit density the neighbor node increased all puts into Node subsets Ω, until formula till no longer increasing, and the zone bit of each node in set omega is all set to 1; If initialization is complete all in K seed corporations, or G 1, G 2..., G min no longer there is the complete subgraph be not labeled, and if the current seed node number θ found is less than K, then adopt phase mutual edge distance farthest principle select remaining seed node; Described phase mutual edge distance farthest principle specifically comprises: establish the current coordinate having found θ ' (θ ' ∈ [θ+1, K]) individual seed node to be y 1, y 2..., y θ ', the coordinate of the node in network except seed node is o 1, o 2..., o n-θ ', then the phase mutual edge distance coordinate of residue seed node o that principle is selected farthest is adopted to be except y in network 1, y 2..., y θ 'outside node coordinate in, meet following formula node, o ' ∈ { o 1, o 2..., o n-θ ', in a calculating K seed corporation each node mapped by MDS after centre coordinate, as the coordinate { x of the initial seed node of K average 1', x 2' ..., x ' k; The account form of described each initial seed corporations centre coordinate is: for the seed corporations G found i, i=1,2 ..., K, if it comprises g node, each node coordinate is y 1, y 2..., y g, then seed corporations centre coordinate is
Preferably, described acquiring unit specifically for, step (1), by (x 1', x 2' ..., x ' k) as initial cluster center, by K set Z ibe set to empty set, wherein i=1,2 ..., K, is set to empty set; Step (2), circulation (3) to (4) is until each cluster no longer changes; Step (3), in the t time iteration, any node coordinate x in network " by following method it is adjusted in a certain classification in K classification and go, for a certain classification i, wherein i=1; 2 ..., K, if all j ≠ i, wherein; j=1,2 ..., K, if || x "-x i' || < || x "-x ' j||, then x " ∈ Z i, wherein, t is the natural number from 1, || a-b|| is the Euclidean distance between coordinate a and b, if a and b is p dimensional vector; Step (4), is calculated a jth component of the new centre coordinate of the i-th class by (3) step in formula, | Z i| be Z ithe number of element in class.Then the centre coordinate of the i-th class is: (x ' i1, x ' i2..., x ' iK).The value calculating J ' according to the centre coordinate newly obtained is: if | J '-J| < δ, then quit a program, and exports cluster result Z i(i=1,2 ..., K), otherwise, make J=J ', go to step (3).
Preferably, described processing unit specifically for, to complex network figure G carry out corporations divide C, note C={c 1, c 2..., c p, wherein, p is to the random natural number N from 1, c i(i=1,2 ..., be p) set that in complex network figure G, some nodes form, calculate the modularity under the division of current corporations in formula, for network node v iwith company's limit number of other nodes in network, wherein, i=1,2 ..., N, M are the number that network connects limit, c mand c nbe respectively node v 1and v jthe numbering of affiliated corporations, wherein, m ∈ [1, p], n ∈ [1, p],
The present invention passes through choose reasonable cluster centre as the initial seed node in K means clustering algorithm, finally utilize K mean algorithm clustering network node in low-dimensional Euclidean space, and cluster is under the jurisdiction of the network node of same corporations, thus improve excavation corporations precision, and then improve the follow-up accuracy to systematic analysis.
Other features and advantages of the present invention will be set forth in the following description, and the becoming apparent from instructions of part, or by implementing the present invention and understanding.Object of the present invention and other advantages realize by structure specifically noted in write instructions, claims and accompanying drawing and obtain.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of a kind of K average community structure method for digging of the embodiment of the present invention;
Fig. 2 is the schematic flow sheet of the another kind of K average community structure method for digging of the embodiment of the present invention;
Fig. 3 is the community structure schematic diagram with overlap that the embodiment of the present invention obtains under karate club network;
Fig. 4 is the structural representation of a kind of K average community structure excavating gear of the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
The present invention utilizes the similarity between each node of Pearson correlation coefficient formula measurement complex network, by multidimensional scaling network node is mapped as the node coordinate in low-dimensional Euclidean space, on this basis, a kind of K average innovatory algorithm based on local diffusion is proposed, this algorithm choose reasonable cluster centre and be under the jurisdiction of the network node of same corporations according to corporations' topological structure feature cluster according to the topological characteristic of community structure in complex network.Compared with prior art, instant invention overcomes tradition based on seed node Stochastic choice in K average corporations method for digging thus the limitation causing precision poor, improve the degree of accuracy of corporations' method for digging, well solve the Mining Problems of community structure in complex network.For a better understanding of the present invention, only with several concrete example, the present invention will be described in detail below.
System embodiment
Embodiments provide a kind of K average community structure method for digging, see Fig. 1, the method comprises:
S101, based on local expansion seed screening method select K initial seed corporations, and the Centroid calculating each initial seed corporations is as initial seed node, described Centroid coordinate as the coordinate figure of K mean cluster seed node, wherein, K ∈ [2, N-1];
S102, the coordinate of described initial seed node is carried out K mean cluster as input parameter, the community structure obtained under current K value divides;
S103, the community structure obtained under different K values divide, the module angle value of corporations' division result under calculating different parameters value K, choose module angle value can be made maximum initial seed node clustering result as optimum division result, using belonging to of a sort initial seed node in optimum division result as a community structure, obtain final division result.
The present invention passes through choose reasonable cluster centre as the initial seed node in K means clustering algorithm, finally utilize K mean algorithm clustering network node in low-dimensional Euclidean space, and cluster is under the jurisdiction of the network node of same corporations, thus improve excavation corporations precision, and then improve the follow-up accuracy to systematic analysis.
Method described in the embodiment of the present invention also comprises: according to the adjacency matrix A of complex network G calculate each network node between identical immediate neighbor node number and form similarity matrix S, according to the Distance matrix D between network node each in similarity matrix computational grid;
Utilize multidimensional scaling MDS that each network node is mapped as the node coordinate in low-dimensional Euclidean space;
Wherein, described adjacency matrix A is for having N number of node v icomplex network figure G, structure a N × N matrix, as node v iwith node v jbetween when having limit to be connected, a ij=1; Without when directly connecting limit between node i and j, a ij=0, wherein, a ijfor each element in adjacency matrix A, concrete expression is internodal connects frontier juncture system, i=1,2 ..., N, j=1,2 ..., N; As i=j, a ij=0; Described similarity matrix S is for having N number of node v icomplex network figure G, structure a N × N matrix, as node v iwith node v jbetween when having limit to be connected, s ij=(A i, A j); Without when directly connecting limit between node i and j, s ij=0, wherein s ijfor each element in S, i=1,2 ..., N, j=1,2 ..., N; As i=j, s ij=0; A ithe vector of the i-th row element composition of representing matrix A, described from matrix D for having N number of node v icomplex network figure G, structure a N × N matrix, d ij=(s ii+ s jj-2s ij) 1/2, wherein, s ijfor each element in S, i=1,2 ..., N, j=1,2 ..., N;
Utilize multidimensional scaling MDS that each network node is mapped as the node coordinate in low-dimensional Euclidean space; Described multidimensional scaling MDS specifically comprises: utilize D=U Λ U tmatrix D is decomposed, wherein, Λ=diag{ λ 1, λ 2..., λ nbe a diagonal matrix, and each element λ ithe eigenwert of representing matrix D, without loss of generality, makes λ 1>=λ 2>=...>=λ n, by eigenwert by descending sort, i-th in matrix U is classified as eigenvalue λ icharacteristic of correspondence vector, if at λ 1>=λ 2>=...>=λ nin, there is p eigenwert and be greater than zero, p ∈ [1, N], then select p eigenwert characteristic of correspondence vector to be designated as u 1, u 2..., u p, and formed the row of matrix P, get the coordinate of N number of node in p dimension space in N number of row vector map network of matrix P, i.e. (x 1, x 2..., x n);
Select K initial seed corporations based on the seed screening method of local expansion described in the embodiment of the present invention, and the Centroid calculating each initial seed corporations specifically comprises as initial seed node:
Seed screening method based on local expansion selects K initial seed corporations, and calculates the initial seed node of centre coordinate as each initial seed corporations of each initial seed corporations according to the Euclidean coordinate of network node.
During concrete enforcement, the present invention is by all arranging a zone bit δ by each node in complex network G i(i=1,2 ..., N), 0 represents that this node is not identified, and 1 represents that this node is identified;
Search all complete subgraphs in this network, according to calculate the mean value of weights in each complete subgraph found, and according to order sequence from big to small, use G 1>=G 2>=...>=G mrepresent;
Be the node selecting the number of degrees maximum in the node of 0 at present node zone bit, if this node is a, at G 1, G 2..., G mmiddle selection comprises the complete subgraph composition subset of node a r be from 1 to M random natural number, and in this subset, choose the complete subgraph G of eligible (1) and condition (2) i, using its center node coordinate as an initial seed node:
Condition (1): G iin the zone bit of all nodes be 0;
Condition (2): G i>=G j, for arbitrarily
Initialization Node subsets with the initial seed corporations G selected ias core, according to expansion G iscope, wherein, α and β is the real number of span between 0 to 1, for G iimmediate neighbor node v, if v is added subgraph G iafter subgraph G i+ v represents, according to formula F C = k in C ( k in C ) &alpha; + ( k out C ) &beta; The subgraph obtained connects limit tight ness rating if F G i + v &GreaterEqual; F G i , Then node v is added subgraph G i, otherwise, node v is not added subgraph G iin, subgraph G can be made by every icompany's limit density the neighbor node increased all puts into Node subsets Ω, until formula till no longer increasing, and the zone bit of each node in set omega is all set to 1;
If initialization is complete all in K seed corporations, or G 1, G 2..., G min no longer there is the complete subgraph be not labeled, and if the current seed node number θ found is less than K, then adopt phase mutual edge distance farthest principle select remaining seed node;
In a calculating K seed corporation each node mapped by MDS after centre coordinate, as the coordinate (x of the initial seed node of K average 1', x 2' ..., x ' k).
Described phase mutual edge distance farthest principle specifically comprises: establish the current coordinate having found θ ' (θ ' ∈ [θ+1, K]) individual seed node to be y 1, y 2..., y θ ', the coordinate of the node in network except seed node is o 1, o 2..., o n-θ ', then the phase mutual edge distance coordinate of residue seed node o that principle is selected farthest is adopted to be except y in network 1, y 2..., y θ 'outside node coordinate in, meet following formula node, o ' ∈ { o 1, o 2..., o n-θ ', in a calculating K seed corporation each node mapped by MDS after centre coordinate, as the coordinate { x of the initial seed node of K average 1', x 2' ..., x ' k;
The account form of described each initial seed corporations centre coordinate is: for the seed corporations G found i, i=1,2 ..., K, if it comprises g node, each node coordinate is y 1, y 2..., y g, then seed corporations centre coordinate is
Specifically, described in the embodiment of the present invention, the coordinate of described initial seed node is carried out K mean cluster as input parameter, the community structure obtained under current K value divides and specifically comprises:
Step (1), by (x 1', x 2' ..., x ' k) as initial cluster center, by K set Z ibe set to empty set, wherein i=1,2 ..., K, is set to empty set;
Step (2), circulation (3) to (4) is until each cluster no longer changes;
Step (3), in the t time iteration, any node coordinate x in network " by following method it is adjusted in a certain classification in K classification and go, for a certain classification i, wherein i=1; 2 ..., K, if all j ≠ i, wherein; j=1,2 ..., K, if || x "-x i' || < || x "-x ' j||, then x " ∈ Z i, wherein, t is the natural number from 1, || a-b|| is the Euclidean distance between coordinate a and b, if a and b is p dimensional vector;
Step (4), is calculated a jth component of the new centre coordinate of the i-th class by (3) step in formula, | Z i| be Z ithe number of element in class.Then the centre coordinate of the i-th class is: (x ' i1, x ' i2..., x ' iK).The value calculating J ' according to the centre coordinate newly obtained is: if | J '-J| < δ, then quit a program, and exports cluster result Z i(i=1,2 ..., K), otherwise, make J=J ', go to step (3).
During concrete enforcement, the community structure obtained described in the embodiment of the present invention under different K values divides, the module angle value of corporations' division result under calculating different parameters value K, choose module angle value can be made maximum initial seed node clustering result as optimum division result, of a sort initial seed node will be belonged in optimum division result as a community structure, obtains final division result and specifically comprise:
Corporations are carried out to complex network figure G and divides C, note C={c 1, c 2..., c p, wherein, p is to the random natural number N from 1, c i(i=1,2 ..., be p) set that in complex network figure G, some nodes form, calculate the modularity under the division of current corporations in formula, for network node v iwith company's limit number of other nodes in network, wherein, i=1,2 ..., N, M are the number that network connects limit, c mand c nbe respectively node v iand v jthe numbering of affiliated corporations, wherein, m ∈ [1, p], n ∈ [1, p],
Fig. 2 is the schematic flow sheet of the another kind of K average community structure method for digging of the embodiment of the present invention, is described in detail method of the present invention below in conjunction with Fig. 2:
Input: an adjacency matrix A with the complex network figure G of N number of node.
Export: a rational community structure of complex network G divides C={c 1, c 2.., c m.
Step 1, to calculate according to the adjacency matrix of complex network each node between identical immediate neighbor node number and form similarity matrix S, according to internodal Distance matrix D each in similarity matrix computational grid;
Described step 1, specific as follows:
The adjacency matrix A of 1.1 acquisition complex networks, note A=(a ij) n × N;
Described complex network figure G=(V, E) is for having N number of node v i(i=1,2 ..., N), M bar connects limit e k(k=1,2 ..., M) network topological diagram, wherein, V=(v 1, v 2..., v n) represent the set of network node, E=(e 1, e 2..., e m) represent that network connects the set on limit, limit e ktwo the node v connected according to it i, v j, be designated as e ij;
Described adjacency matrix A refers to have N number of node v for this icomplex network figure G, structure a N × N matrix, as node v iwith node v jbetween when having limit to be connected, a ij=1; Without when directly connecting limit between node i and j, a ij=0, wherein, a ijfor each element in adjacency matrix A, represent internodal and connect frontier juncture system, i=1,2 ..., N, j=1,2 ..., N; As i=j, a ij=0;
Described similarity matrix S refers to have N number of node v for this icomplex network figure G, structure a N × N matrix, as node v iwith node v jbetween when having limit to be connected, s ij=(A i, A j); Without when directly connecting limit between node i and j, s ij=0, wherein s ijfor each element in S, i=1,2 ..., N, j=1,2 ..., N; As i=j, s ij=0; A ithe vector of the i-th row element composition of representing matrix A,
Described Distance matrix D refers to have N number of node v for this icomplex network figure G, structure a N × N matrix, d ij=(s ii+s jj-2s ij) 1/2.Wherein s ijfor each element in S, i=1,2 ..., N, j=1,2 ..., N.
Step 2, multidimensional scaling (MDS) is utilized each network node to be mapped as node coordinate in low-dimensional Euclidean space;
Described step 2, specific as follows:
2.1 utilize formula (1) to decompose matrix D, and formula (1) is as follows:
D=UΛU T(1)
Wherein, Λ=diag{ λ 1, λ 2..., λ nbe a diagonal matrix, and each element λ ithe eigenwert of representing matrix D, without loss of generality, makes λ 1>=λ 2>=...>=λ n, by eigenwert by descending sort.In matrix U i-th is classified as eigenvalue λ icharacteristic of correspondence vector.
If 2.2 at λ 1>=λ 2>=...>=λ nin, there is p eigenwert and be greater than zero, p ∈ [1, N], then select p eigenwert characteristic of correspondence vector to be designated as u 1, u 2..., u p, and formed the row of matrix P.
2.3 get the coordinate of N number of node in p dimension space in N number of row vector map network of matrix P, i.e. (x 1, x 2..., x n).
Step 3, select K initial seed corporations according to the seed screening submethod based on local expansion, and the centre coordinate calculating these seed corporations according to the Euclidean coordinate of network node is as K initial seed node;
Described step 3, specific as follows:
The described screening of the seed based on local expansion submethod, its feature is as follows:
The method comprises the following steps:
Input: adjacency matrix A and a seed node number K with the complex network figure G of N number of node.
Export: the coordinate of K seed node.
Step is 1.: be the zone bit δ of each Node configuration in complex network G i(i=1,2 ..., N), the numeral of this zone bit to be value be 0 or 1,0 represents that this node is not identified, and 1 represents that this node is identified.
Step is 2.: search all complete subgraphs in this network, calculate the mean value of weights in each complete subgraph found according to formula (4-8), and according to order sequence from big to small, uses G 1>=G 2>=...>=G mrepresent.
Described complete subgraph is the subgraph with following features, subgraph G '=(V ', E ') there is the individual node v of N ' i(i=1,2 ..., N '), then its limit number M '=N ' (N '-1)/2.
Step is 3.: be the node selecting the number of degrees maximum in the node of 0 at present node zone bit, if this node is a.At G 1, G 2..., G mmiddle selection comprises the complete subgraph composition subset of node a and in this subset, choose the complete subgraph G meeting following condition i, using its centre coordinate as an initial seed node:
(1) G iin the zone bit of all nodes be 0;
(2) G i>=G j, for arbitrarily
Step is 4.: initialization Node subsets with the step 3. middle initial seed corporations G selected ias core, expand G according to formula (4-9) iscope.
Wherein, α and β is the real number of span between 0 to 1.For G iimmediate neighbor node v, if v is added subgraph G ithe subgraph obtained according to formula (4-9) afterwards connects limit tight ness rating if then node v is added subgraph G i; If then node v is not added subgraph G iin.Subgraph G can be made by every icompany's limit density the neighbor node increased all puts into Node subsets Ω, until formula (4-9) no longer increases.
Step is 5.: the zone bit of each node in set omega is all set to 1, represents that this node may belong to same corporations with existing seed node, thus no longer consider this node in subsequent seed search procedure.
Step is 6.: if initialization is complete all in K seed corporations, or G 1, G 2..., G min no longer there is the complete subgraph be not labeled, then go to step 7., otherwise, go to step 3..
Described K is the positive natural number preset, and its span is [2, N].
Step is 7.: be less than K if current with the seed node number θ found, then adopt phase mutual edge distance farthest principle select remaining seed node.
Step is 8.: in a calculating K seed corporation each node mapped by MDS after centre coordinate, as the coordinate (x of the initial seed node of K average 1', x 2' ..., x ' k).
Step 4, K the seed found in step 3 is carried out K mean cluster as input parameter, the community structure obtained under current K value divides;
Described step 4, specific as follows:
Described K mean algorithm comprises following steps:
Input: cluster number K, the coordinate (x of N number of node 1, x 2..., x n) and initial K seed node coordinate (x 1', x 2' ..., x ' k).
Export: K the cluster meeting variance minimum sandards.
Step (1): by (x 1', x 2' ..., x ' k) as initial cluster center;
Step (2): circulation (3) to (4) is until each cluster no longer changes;
Step (3): in the t time iteration, by K set Z i(i=1,2 ..., K) be set to empty set.Any node coordinate x in network " adjusts to it by following method in a certain classification in K classification and goes.For a certain classification i (i=1,2 ..., K), if all j ≠ i (j=1,2 ..., K), if || x "-x i' || < || x "-x ' j||, then x " ∈ Z i.
Described t is the natural number from 1.|| the Euclidean distance between a-b|| denotation coordination a and b, if a and b is p dimensional vector.
Step (4): the jth component x ' being calculated the new centre coordinate of the i-th class by (3) step ij.
In formula, | Z i| be Z ithe number of element in class.Then the centre coordinate of the i-th class is: (x ' i1, x ' i2..., x ' iK).The value calculating J ' according to the centre coordinate newly obtained is:
If | J '-J| < δ, then quit a program, and exports cluster result Z i(i=1,2 ..., K), otherwise, make J=J ', go to step (3).
Step 5, for each K ∈ [2, N-1], equal operating procedure 3 to step 4, thus obtain community structure under different K values and divide.Calculate the module angle value of corporations' division result under different parameters value K, choose module angle value can be made maximum node clustering result as optimum division result.Using belonging to of a sort node in optimum division result as a community structure, obtain final division result.
Described step 5, specific as follows:
Described modularity formula is: carry out corporations to complex network figure G and divide C, note C={c 1, c 2..., c p, wherein, c i(i=1,2 ..., p) be respectively the set of some node compositions in complex network figure G, calculate the modularity Q under current corporations division C c, formula is as follows:
In formula, represent network node v idegree, namely refer to this node v iwith company's limit number of other nodes in network, wherein, i=1,2 ..., N;
M represents that network connects the number on limit;
C mand c nrepresent node v respectively iand v jthe numbering of affiliated corporations, wherein m ∈ [1, p], n ∈ [1, p];
δ (c m, c n) function is by shown in formula (3):
The similarity that the present invention will utilize between each node of Pearson correlation coefficient formula measurement complex network, by multidimensional scaling network node is mapped as the node coordinate in low-dimensional Euclidean space, on this basis, a kind of K average innovatory algorithm based on local diffusion is proposed, this algorithm choose reasonable cluster centre and be under the jurisdiction of the network node of same corporations according to corporations' topological structure feature cluster according to the topological characteristic of community structure in complex network.Compared with prior art, instant invention overcomes tradition based on seed node Stochastic choice in K average corporations method for digging thus the limitation causing precision poor, improve the degree of accuracy of corporations' method for digging, well solve the Mining Problems of community structure in complex network.
Fig. 3 be the embodiment of the present invention obtain under karate club network there is typical community structure schematic diagram, adopt the classical data set Zachary karate club network in community network to carry out community structure excavation below in conjunction with Fig. 3 to the present embodiment to be described, this network packet is containing 34 nodes and 78 limits, α and β all gets 1.Specifically comprise step as follows:
Step 1, according to the adjacency matrix of complex network calculate each node between identical immediate neighbor node number and form similarity matrix S, according to internodal Distance matrix D each in similarity matrix computational grid;
Step 2, utilizes multidimensional scaling (MDS) each network node to be mapped as node coordinate in 2 dimension Euclidean spaces;
Step 3, select K initial seed corporations, and the centre coordinate calculating these seed corporations according to the Euclidean coordinate of network node is as K initial seed node according to the seed screening subalgorithm based on local expansion.Such as, as K=2, be that { 1,2,4} with { 30,33,34}, then kind subcoordinate during K=2 is the centre coordinate of the node respective coordinates of these Liang Ge seed corporations through 2 seed corporations that seed screening subalgorithm calculates;
Step 4, carries out K mean cluster using K the seed found in step 3 as input parameter, and the community structure obtained under current K value divides;
Step 5, for each K ∈ [2,33], equal operating procedure 3 to step 4, thus the community structure obtained under different K values divides.Calculate the module angle value of corporations' division result under different parameters value K, choose module angle value can be made maximum node clustering result as optimum division result.Using belonging to of a sort node in optimum division result as a community structure, obtain final division result.In this example, as K=2, the module angle value that corporations divide is maximum, therefore this result is net result.
Device embodiment
Embodiments provide a kind of K average community structure excavating gear, see Fig. 4, this device comprises:
Computing unit, select K initial seed corporations, and the Centroid calculating each initial seed corporations is as initial seed node for the seed screening method based on local expansion, described Centroid coordinate is as the coordinate figure of K mean cluster seed node, wherein, K ∈ [2, N-1];
Acquiring unit, for the coordinate of described initial seed node is carried out K mean cluster as input parameter, the community structure obtained under current K value divides;
Processing unit, divide for the community structure obtained under different K values, the module angle value of corporations' division result under calculating different parameters value K, choose module angle value can be made maximum initial seed node clustering result as optimum division result, using belonging to of a sort initial seed node in optimum division result as a community structure, obtain final division result.
The present invention passes through choose reasonable cluster centre as the initial seed node in K means clustering algorithm, finally utilize K mean algorithm clustering network node in low-dimensional Euclidean space, and cluster is under the jurisdiction of the network node of same corporations, thus improve excavation corporations precision, and then improve the follow-up accuracy to systematic analysis.
Preferably, the device described in the embodiment of the present invention also comprises: map unit; Described map unit, for calculate according to the adjacency matrix A of complex network G each network node between identical immediate neighbor node number and form similarity matrix S, according to the Distance matrix D between network node each in similarity matrix computational grid; Wherein, described adjacency matrix A is for having N number of node v icomplex network figure G, structure a N × N matrix, as node v iwith node v jbetween when having limit to be connected, a ij=1; Without when directly connecting limit between node i and j, a ij=0, wherein, a ijfor each element in adjacency matrix A, concrete expression is internodal connects frontier juncture system, i=1,2 ..., N, j=1,2 ..., N; As i=j, a ij=0; Described similarity matrix S is for having N number of node v icomplex network figure G, structure a N × N matrix, as node v iwith node v jbetween when having limit to be connected, s ij=(A i, A j); Without when directly connecting limit between node i and j, s ij=0, wherein s ijfor each element in S, i=1,2 ..., N, j=1,2 ..., N; As i=j, s ij=0; A ithe vector of the i-th row element composition of representing matrix A, described from matrix D for having N number of node v icomplex network figure G, structure a N × N matrix, d ij=(s ii+ s jj-2s ij) 1/2, wherein, s ijfor each element in S, i=1,2 ..., N, j=1,2 ..., N; Utilize multidimensional scaling MDS that each network node is mapped as the node coordinate in low-dimensional Euclidean space; Described multidimensional scaling MDS specifically comprises: utilize D=U Λ U tmatrix D is decomposed, wherein, Λ=diag{ λ 1, λ 2..., λ nbe a diagonal matrix, and each element λ ithe eigenwert of representing matrix D, without loss of generality, makes λ 1>=λ 2>=...>=λ n, by eigenwert by descending sort, i-th in matrix U is classified as eigenvalue λ icharacteristic of correspondence vector, if at λ 1>=λ 2>=...>=λ nin, there is p eigenwert and be greater than zero, p ∈ [1, N], then select p eigenwert characteristic of correspondence vector to be designated as u 1, u 2..., u p, and formed the row of matrix P, get the coordinate of N number of node in p dimension space in N number of row vector map network of matrix P, i.e. (x 1, x 2..., x n);
Preferably, the computing unit described in the embodiment of the present invention specifically for, each node in complex network G is all arranged a zone bit δ i(i=1,2 ..., N), 0 represents that this node is not identified, and 1 represents that this node is identified; Search all complete subgraphs in this network, according to calculate the mean value of weights in each complete subgraph found, and according to order sequence from big to small, use G 1>=G 2>=...>=G mrepresent; Be the node selecting the number of degrees maximum in the node of 0 at present node zone bit, if this node is a, at G 1, G 2..., G mmiddle selection comprises the complete subgraph composition subset of node a r be from 1 to M random natural number, and in this subset, choose the complete subgraph G of eligible (1) and condition (2) i, using its center node coordinate as an initial seed node: condition (1): G iin the zone bit of all nodes be 0; Condition (2): G i>=G j, for arbitrarily initialization Node subsets with the initial seed corporations G selected ias core, according to expansion G iscope, wherein, α and β is the real number of span between 0 to 1, for G iimmediate neighbor node v, if v is added subgraph G iafter subgraph G i+ v represents, according to formula the subgraph obtained connects limit tight ness rating if then node v is added subgraph G i, otherwise, node v is not added subgraph G iin, subgraph G can be made by every icompany's limit density the neighbor node increased all puts into Node subsets Ω, until formula till no longer increasing, and the zone bit of each node in set omega is all set to 1; If initialization is complete all in K seed corporations, or G 1, G 2..., G min no longer there is the complete subgraph be not labeled, and if the current seed node number θ found is less than K, then adopt phase mutual edge distance farthest principle select remaining seed node; In a calculating K seed corporation each node mapped by MDS after centre coordinate, as the coordinate (x of the initial seed node of K average 1', x 2' ..., x ' k), described phase mutual edge distance farthest principle specifically comprises: establish the current coordinate having found θ ' (θ ' ∈ [θ+1, K]) individual seed node to be y 1, y 2..., y θ ', the coordinate of the node in network except seed node is o 1, o 2..., o n-θ ', then the phase mutual edge distance coordinate of residue seed node o that principle is selected farthest is adopted to be except y in network 1, y 2..., y θ 'outside node coordinate in, meet following formula node, o ' ∈ { o 1, o 2..., o n-θ ', in a calculating K seed corporation each node mapped by MDS after centre coordinate, as the coordinate { x of the initial seed node of K average 1', x 2' ..., x ' k; The account form of described each initial seed corporations centre coordinate is: for the seed corporations G found i, i=1,2 ..., K, if it comprises g node, each node coordinate is y 1, y 2..., y g, then seed corporations centre coordinate is
Preferably, acquiring unit described in the embodiment of the present invention specifically for, step (1), by (x 1', x 2' ..., x ' k) as initial cluster center, by K set Z ibe set to empty set, wherein i=1,2 ..., K, is set to empty set; Step (2), circulation (3) to (4) is until each cluster no longer changes; Step (3), in the t time iteration, any node coordinate x in network " by following method it is adjusted in a certain classification in K classification and go, for a certain classification i, wherein i=1; 2 ..., K, if all j ≠ i, wherein; j=1,2 ..., K, if || x "-x i' || < || x "-x ' j||, then x " ∈ Z i, wherein, t is the natural number from 1, || a-b|| is the Euclidean distance between coordinate a and b, if a and b is p dimensional vector; Step (4), is calculated a jth component of the new centre coordinate of the i-th class by (3) step in formula, | Z i| be Z ithe number of element in class.Then the centre coordinate of the i-th class is: (x ' i1, x ' i2..., x ' iK).The value calculating J ' according to the centre coordinate newly obtained is: if | J '-J| < δ, then quit a program, and exports cluster result Z i(i=1,2 ..., K), otherwise, make J=J ', go to step (3).
Preferably, the processing unit described in the embodiment of the present invention specifically for, to complex network figure GG carry out corporations divide C, note C={c 1, c 2..., c p, wherein, p is to the random natural number N from 1, c i(i=1,2 ..., be p) set that in complex network figure G, some nodes form, calculate the modularity under the division of current corporations in formula, for network node v iwith company's limit number of other nodes in network, wherein, i=1,2 ..., N, M are the number that network connects limit, c mand c nbe respectively node v iand v jthe numbering of affiliated corporations, wherein, m ∈ [1, p], n ∈ [1, p],
It should be noted that, the relevant portion of the embodiment of the present invention can the relevant portion of reference method embodiment be understood, and no longer repeats at this.
To sum up, the present invention passes through choose reasonable cluster centre as the initial seed node in K means clustering algorithm, finally utilize K mean algorithm clustering network node in low-dimensional Euclidean space, and cluster is under the jurisdiction of the network node of same corporations, thus improve excavation corporations precision, and then improve the follow-up accuracy to systematic analysis.
The above; be only the present invention's preferably embodiment, but protection scope of the present invention is not limited thereto, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; the change that can expect easily or replacement, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claims.

Claims (10)

1. a K average community structure method for digging, the method comprises:
Seed screening method based on local expansion selects K initial seed corporations, and the Centroid calculating each initial seed corporations is as initial seed node, described Centroid coordinate as the coordinate figure of K mean cluster seed node, wherein, K ∈ [2, N-1];
The coordinate of described initial seed node is carried out K mean cluster as input parameter, and the community structure obtained under current K value divides;
The community structure obtained under different K values divides, the module angle value of corporations' division result under calculating different parameters value K, choose module angle value can be made maximum initial seed node clustering result as optimum division result, using belonging to of a sort initial seed node in optimum division result as a community structure, obtain final division result.
2. method according to claim 1, is characterized in that, also comprises:
According to the adjacency matrix A of complex network G calculate each network node between identical immediate neighbor node number and form similarity matrix S, according to the Distance matrix D between network node each in similarity matrix computational grid;
Wherein, described adjacency matrix A is for having N number of node v icomplex network figure G, structure a N × N matrix, as node v iwith node v jbetween when having limit to be connected, a ij=1; Without when directly connecting limit between node i and j, a ij=0, wherein, a ijfor each element in adjacency matrix A, concrete expression is internodal connects frontier juncture system, i=1,2 ..., N, j=1,2 ..., N; As i=j, a ij=0; Described similarity matrix S is for having N number of node v icomplex network figure G, structure a N × N matrix, as node v iwith node v jbetween when having limit to be connected, s ij=(A i, A j); Without when directly connecting limit between node i and j, s ij=0, wherein s ijfor each element in S, i=1,2 ..., N, j=1,2 ..., N; As i=j, s ij=0; A ithe vector of the i-th row element composition of representing matrix A, described Distance matrix D is for having N number of node v icomplex network figure G, structure a N × N matrix, d ij=(s ii+ s jj-2s ij) 1/2, wherein, s ijfor each element in S, i=1,2 ..., N, j=1,2 ..., N;
Utilize multidimensional scaling MDS that each network node is mapped as the node coordinate in low-dimensional Euclidean space; Described multidimensional scaling MDS specifically comprises: utilize D=U Λ U tmatrix D is decomposed, wherein, Λ=diag{ λ 1, λ 2..., λ nbe a diagonal matrix, and each element λ ithe eigenwert of representing matrix D, without loss of generality, makes λ 1>=λ 2>=...>=λ n, by eigenwert by descending sort, i-th in matrix U is classified as eigenvalue λ icharacteristic of correspondence vector, if at λ 1>=λ 2>=...>=λ nin, there is p eigenwert and be greater than zero, p ∈ [1, N], then select p eigenwert characteristic of correspondence vector to be designated as u 1, u 2..., u p, and formed the row of matrix P, get the coordinate of N number of node in p dimension space in N number of row vector map network of matrix P, i.e. (x 1, x 2..., x n);
The described seed screening method based on local expansion selects K initial seed corporations, and the Centroid calculating each initial seed corporations specifically comprises as initial seed node:
Seed screening method based on local expansion selects K initial seed corporations, and calculates the initial seed node of centre coordinate as each initial seed corporations of each initial seed corporations according to the Euclidean coordinate of network node.
3. method according to claim 2, it is characterized in that, the described seed screening method based on local expansion selects K initial seed corporations, and the centre coordinate calculating each initial seed corporations according to the Euclidean coordinate of network node specifically comprises as the initial seed node of each initial seed corporations:
Each node in complex network G is all arranged a zone bit δ i(i=1,2 ..., N), 0 represents that this node is not identified, and 1 represents that this node is identified;
Search all complete subgraphs in this network, according to calculate the mean value of weights in each complete subgraph found, and according to order sequence from big to small, use G 1>=G 2>=...>=G mrepresent;
Be the node selecting the number of degrees maximum in the node of 0 at present node zone bit, if this node is a, at G 1, G 2..., G mmiddle selection comprises the complete subgraph composition subset of node a r be from 1 to M random natural number, and in this subset, choose the complete subgraph G of eligible (1) and condition (2) i, using its center node coordinate as an initial seed node:
Condition (1): G iin the zone bit of all nodes be 0;
Condition (2): G i>=G j, for arbitrarily
Initialization Node subsets with the initial seed corporations G selected ias core, according to expansion G iscope, wherein, α and β is the real number of span between 0 to 1, for G iimmediate neighbor node v, if v is added subgraph G iafter subgraph G i+ v represents, according to formula the subgraph obtained connects limit tight ness rating if then node v is added subgraph G i, otherwise, node v is not added subgraph G iin, subgraph G can be made by every icompany's limit density the neighbor node increased all puts into Node subsets Ω, until formula till no longer increasing, and the zone bit of each node in set omega is all set to 1;
If initialization is complete all in K seed corporations, or G 1, G 2..., G min no longer there is the complete subgraph be not labeled, and if the current seed node number θ found is less than K, then adopt phase mutual edge distance farthest principle select remaining seed node;
Described phase mutual edge distance farthest principle specifically comprises: establish the current coordinate having found θ ' (θ ' ∈ [θ+1, K]) individual seed node to be y 1, y 2..., y θ ', the coordinate of the node in network except seed node is o 1, o 2..., o n-θ ', then the phase mutual edge distance coordinate of residue seed node o that principle is selected farthest is adopted to be except y in network 1, y 2..., y θ 'outside node coordinate in, meet following formula node, o ' ∈ { o 1, o 2..., o n-θ ', in a calculating K seed corporation each node mapped by MDS after centre coordinate, as the initial seed node of K average coordinate x ' 1, x ' 2..., x ' k;
The account form of described each initial seed corporations centre coordinate is: for the seed corporations G found i, i=1,2 ..., K, if it comprises g node, each node coordinate is y 1, y 2..., y g, then seed corporations centre coordinate is
4. method according to claim 3, is characterized in that, the coordinate of described initial seed node is carried out K mean cluster as input parameter, and the community structure obtained under current K value divides and specifically comprises:
Step (1), will (x ' 1, x ' 2..., x ' k) as initial cluster center, by K set Z ibe set to empty set, wherein i=1,2 ..., K;
Step (2), circulation (3) to (4) is until each cluster no longer changes;
Step (3), in the t time iteration, any node coordinate x in network " by following method it is adjusted in a certain classification in K classification and go, for a certain classification i, wherein i=1; 2 ..., K, if all j ≠ i, wherein; j=1,2 ..., K, if || x "-x ' i|| <||x "-x ' j||, then x " ∈ Z i, wherein, t is the natural number from 1, || a-b|| is the Euclidean distance between coordinate a and b, if a and b is p dimensional vector;
Step (4), is calculated a jth component of the new centre coordinate of the i-th class by (3) step in formula, | Z i| be Z ithe number of element in class, then the centre coordinate of the i-th class is: (x ' i1, x ' i2..., x ' iK), the value calculating J' according to the centre coordinate newly obtained is: if | J '-J|< δ, then quit a program, and exports cluster result Z i(i=1,2 ..., K), otherwise, make J=J ', go to step (3).
5. the method according to claim 1-4 any one, it is characterized in that, the community structure of described acquisition under different K values divides, the module angle value of corporations' division result under calculating different parameters value K, choose module angle value can be made maximum initial seed node clustering result as optimum division result, of a sort initial seed node will be belonged in optimum division result as a community structure, obtains final division result and specifically comprise:
Corporations are carried out to complex network figure G and divides C, note C={c 1, c 2..., c p, wherein, p is to the random natural number N from 1, c i(i=1,2 ..., be p) set that in complex network figure G, some nodes form, calculate the modularity under the division of current corporations in formula, for network node v iwith company's limit number of other nodes in network, wherein, i=1,2 ..., N, M are the number that network connects limit, c mand c nbe respectively node v iand v jthe numbering of affiliated corporations, wherein, m ∈ [1, p], n ∈ [1, p],
6. a K average community structure excavating gear, is characterized in that, comprising:
Computing unit, select K initial seed corporations, and the Centroid calculating each initial seed corporations is as initial seed node for the seed screening method based on local expansion, described Centroid coordinate is as the coordinate figure of K mean cluster seed node, wherein, K ∈ [2, N-1];
Acquiring unit, for the coordinate of described initial seed node is carried out K mean cluster as input parameter, the community structure obtained under current K value divides;
Processing unit, divide for the community structure obtained under different K values, the module angle value of corporations' division result under calculating different parameters value K, choose module angle value can be made maximum initial seed node clustering result as optimum division result, using belonging to of a sort initial seed node in optimum division result as a community structure, obtain final division result.
7. device according to claim 6, is characterized in that, this device also comprises: map unit;
Described map unit, for calculate according to the adjacency matrix A of complex network G each network node between identical immediate neighbor node number and form similarity matrix S, according to the Distance matrix D between network node each in similarity matrix computational grid; Wherein, described adjacency matrix A is for having N number of node v icomplex network figure G, structure a N × N matrix, as node v iwith node v jbetween when having limit to be connected, a ij=1; Without when directly connecting limit between node i and j, a ij=0, wherein, a ijfor each element in adjacency matrix A, concrete expression is internodal connects frontier juncture system, i=1,2 ..., N, j=1,2 ..., N; As i=j, a ij=0; Described similarity matrix S is for having N number of node v icomplex network figure G, structure a N × N matrix, as node v iwith node v jbetween when having limit to be connected, s ij=(A i, A j); Without when directly connecting limit between node i and j, s ij=0, wherein s ijfor each element in S, i=1,2 ..., N, j=1,2 ..., N as i=j, s ij=0; A ithe vector of the i-th row element composition of representing matrix A, described Distance matrix D is for having N number of node v icomplex network figure G, structure a N × N matrix, d ij=(s ii+ s jj-2s ij) 1/2, wherein, s ijfor each element in S, i=1,2 ..., N, j=1,2 ..., N; Utilize multidimensional scaling MDS that each network node is mapped as the node coordinate in low-dimensional Euclidean space; Described multidimensional scaling MDS specifically comprises: utilize D=U Λ U tmatrix D is decomposed, wherein, Λ=diag{ λ 1, λ 2..., λ nbe a diagonal matrix, and each element λ ithe eigenwert of representing matrix D, without loss of generality, makes λ 1>=λ 2>=...>=λ n, by eigenwert by descending sort, i-th in matrix U is classified as eigenvalue λ icharacteristic of correspondence vector, if at λ 1>=λ 2>=...>=λ nin, there is p eigenwert and be greater than zero, p ∈ [1, N], then select p eigenwert characteristic of correspondence vector to be designated as u 1, u 2..., u p, and formed the row of matrix P, get the coordinate of N number of node in p dimension space in N number of row vector map network of matrix P, i.e. (x 1, x 2..., x n);
Described computing unit specifically for, the seed screening method based on local expansion selects K initial seed corporations, and calculates the initial seed node of centre coordinate as each initial seed corporations of each initial seed corporations according to the Euclidean coordinate of network node.
8. device according to claim 7, is characterized in that,
Described computing unit specifically for, each node in complex network G is all arranged a zone bit δ i(i=1,2 ..., N), 0 represents that this node is not identified, and 1 represents that this node is identified; Search all complete subgraphs in this network, according to calculate the mean value of weights in each complete subgraph found, and according to order sequence from big to small, use G 1>=G 2>=...>=G mrepresent; Be the node selecting the number of degrees maximum in the node of 0 at present node zone bit, if this node is a, at G 1, G 2..., G mmiddle selection comprises the complete subgraph composition subset of node a r be from 1 to M random natural number, and in this subset, choose the complete subgraph G of eligible (1) and condition (2) i, using its center node coordinate as an initial seed node: condition (1): G iin the zone bit of all nodes be 0; Condition (2): G i>=G j, for arbitrarily initialization Node subsets with the initial seed corporations G selected ias core, according to expansion G iscope, wherein, α and β is the real number of span between 0 to 1, for G iimmediate neighbor node v, if v is added subgraph G iafter subgraph G i+ v represents, according to formula the subgraph obtained connects limit tight ness rating if then node v is added subgraph G i, otherwise, node v is not added subgraph G iin, subgraph G can be made by every icompany's limit density the neighbor node increased all puts into Node subsets Ω, until formula till no longer increasing, and the zone bit of each node in set omega is all set to 1; If initialization is complete all in K seed corporations, or G 1, G 2..., G min no longer there is the complete subgraph be not labeled, and if the current seed node number θ found is less than K, then adopt phase mutual edge distance farthest principle select remaining seed node; Described phase mutual edge distance farthest principle specifically comprises: establish the current coordinate having found θ ' (θ ' ∈ [θ+1, K]) individual seed node to be y 1, y 2..., y θ ', the coordinate of the node in network except seed node is o 1, o 2..., o n-θ ', then the phase mutual edge distance coordinate of residue seed node o that principle is selected farthest is adopted to be except y in network 1, y 2..., y θ 'outside node coordinate in, meet following formula node, o ' ∈ { o 1, o 2..., o n-θ ', in a calculating K seed corporation each node mapped by MDS after centre coordinate, as the initial seed node of K average coordinate x ' 1, x ' 2..., x ' k; The account form of described each initial seed corporations centre coordinate is: for the seed corporations G found i, i=1,2 ..., K, if it comprises g node, each node coordinate is y 1, y 2..., y g, then seed corporations centre coordinate is
9. device according to claim 8, is characterized in that,
Described acquiring unit specifically for, step (1), will (x ' 1, x ' 2..., x ' k) as initial cluster center, by K set Z ibe set to empty set, wherein i=1,2 ..., K, is set to empty set; Step (2), circulation (3) to (4) is until each cluster no longer changes; Step (3), in the t time iteration, any node coordinate x in network " by following method it is adjusted in a certain classification in K classification and go, for a certain classification i, wherein i=1; 2 ..., K, if all j ≠ i, wherein; j=1,2 ..., K, if || x "-x ' i|| <||x "-x ' j||, then x " ∈ Z i, wherein, t is the natural number from 1, || a-b|| is the Euclidean distance between coordinate a and b, if a and b is p dimensional vector; Step (4), is calculated a jth component of the new centre coordinate of the i-th class by (3) step in formula, | Z i| be Z ithe number of element in class, then the centre coordinate of the i-th class is: (x ' i1, x ' i2..., x ' iK), the value calculating J' according to the centre coordinate newly obtained is: if | J '-J|< δ, then quit a program, and exports cluster result Z i(i=1,2 ..., K), otherwise, make J=J ', go to step (3).
10., according to the device in claim 6-9 described in any one, it is characterized in that,
Described processing unit specifically for, to complex network figure G carry out corporations divide C, note C={c 1, c 2..., c p, wherein, p is to the random natural number N from 1, c i(i=1,2 ..., be p) set that in complex network figure G, some nodes form, calculate the modularity under the division of current corporations in formula, for network node v iwith company's limit number of other nodes in network, wherein, i=1,2 ..., N, M are the number that network connects limit, c mand c nbe respectively node v iand v jthe numbering of affiliated corporations, wherein, m ∈ [1, p], n ∈ [1, p],
CN201510784716.XA 2015-11-16 2015-11-16 K-mean community structure mining method and apparatus Pending CN105488247A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510784716.XA CN105488247A (en) 2015-11-16 2015-11-16 K-mean community structure mining method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510784716.XA CN105488247A (en) 2015-11-16 2015-11-16 K-mean community structure mining method and apparatus

Publications (1)

Publication Number Publication Date
CN105488247A true CN105488247A (en) 2016-04-13

Family

ID=55675221

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510784716.XA Pending CN105488247A (en) 2015-11-16 2015-11-16 K-mean community structure mining method and apparatus

Country Status (1)

Country Link
CN (1) CN105488247A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107465524A (en) * 2016-06-03 2017-12-12 国网辽宁省电力有限公司大连供电公司 A kind of powerline network Safety Analysis Method based on community structure
CN108122168A (en) * 2016-11-28 2018-06-05 中国科学技术大学先进技术研究院 Seed node screening technique and device in social activity network
CN109859054A (en) * 2018-12-13 2019-06-07 平安科技(深圳)有限公司 Network community method for digging, device, computer equipment and storage medium
US11468521B2 (en) 2016-10-31 2022-10-11 Tencent Technology (Shenzhen) Company Limited Social media account filtering method and apparatus

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107465524A (en) * 2016-06-03 2017-12-12 国网辽宁省电力有限公司大连供电公司 A kind of powerline network Safety Analysis Method based on community structure
US11468521B2 (en) 2016-10-31 2022-10-11 Tencent Technology (Shenzhen) Company Limited Social media account filtering method and apparatus
CN108122168A (en) * 2016-11-28 2018-06-05 中国科学技术大学先进技术研究院 Seed node screening technique and device in social activity network
CN108122168B (en) * 2016-11-28 2020-11-13 中国科学技术大学先进技术研究院 Method and device for screening seed nodes in social activity network
CN109859054A (en) * 2018-12-13 2019-06-07 平安科技(深圳)有限公司 Network community method for digging, device, computer equipment and storage medium
CN109859054B (en) * 2018-12-13 2024-03-05 平安科技(深圳)有限公司 Network community mining method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110532436B (en) Cross-social network user identity recognition method based on community structure
CN102594909B (en) Multi-objective community detection method based on spectrum information of common neighbour matrix
CN106503148B (en) A kind of table entity link method based on multiple knowledge base
CN104657418B (en) A kind of complex network propagated based on degree of membership obscures corporations&#39; method for digging
CN105488247A (en) K-mean community structure mining method and apparatus
CN103020163A (en) Node-similarity-based network community division method in network
CN103020267B (en) Based on the complex network community structure method for digging of triangular cluster multi-label
CN102065019B (en) IP (Internet Protocol) core fast mapping method for network on chip based on region division
CN103888541A (en) Method and system for discovering cells fused with topology potential and spectral clustering
CN106326637A (en) Link predicting method based on local effective path degree
CN102253961A (en) Method for querying road network k aggregation nearest neighboring node based on Voronoi graph
CN106548418A (en) Power system small interference stability appraisal procedure
CN105938608A (en) Label-influence-driven semi-synchronous community discovery method
CN107742169A (en) A kind of Urban Transit Network system constituting method and performance estimating method based on complex network
CN104836711A (en) Construction method of command control network generative model
CN102819611B (en) Local community digging method of complicated network
Song et al. Fuzzy C-means clustering analysis based on quantum particle swarm optimization algorithm for the grouping of rock discontinuity sets
CN106780058A (en) The group dividing method and device of dynamic network
CN102779142A (en) Quick community discovery method based on community closeness
Wenli et al. Identifying node importance based on information entropy in complex networks
CN104657442B (en) Multiple target community discovery method based on Local Search
CN103559318B (en) The method that the object containing heterogeneous information network packet is ranked up
CN107276093B (en) The Probabilistic Load calculation method cut down based on scene
CN110796731A (en) River channel grid calculation order coding method
CN107578136A (en) The overlapping community discovery method extended based on random walk with seed

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160413