CN109886313A

CN109886313A - A dynamic graph clustering method based on density peaks

Info

Publication number: CN109886313A
Application number: CN201910080266.4A
Authority: CN
Inventors: 谷峪; 吴长发; 于戈
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2019-01-28
Filing date: 2019-01-28
Publication date: 2019-06-14

Abstract

The present invention provides a dynamic graph clustering method based on density peaks, which is used for clustering dynamic graphs, returning clustering results in real time and discovering cluster evolution events, wherein the clustering results include clusters, abnormal vertices and bridges in the graph. vertex. Including static graph clustering method and dynamic graph clustering method, it is divided into two stages: initialization and dynamic detection. In the initialization stage, calculate the local density of vertices, dependent vertices and dependency similarity; to improve the efficiency of the algorithm, generate the DP-Index index; generate a decision graph, obtain density peak vertices and noise vertices through the decision graph; obtain cluster results based on the density peak idea Sets, abnormal vertex sets, and bridge vertex sets; create dependency graphs based on clustering results, laying the foundation for dynamic graph clustering. In the dynamic update stage, the DP-Index and the dependency graph are updated according to the insertion or deletion of vertices and the insertion or deletion of edges; clustering results and cluster evolution events are obtained according to the dynamic changes of the dependency graph and the dependency graph.

Description

A kind of Dynamic Graph clustering method based on density peak

Technical field

The invention belongs to computer large-scale graph data process fields, and in particular to a kind of Dynamic Graph based on density peak is poly- Class method.

Background technique

Since graph model has powerful expressive force, the relationship between data and data is built in the application of many real worlds Mould is figure, and wherein vertex represents corresponding entity, the relationship between Bian Daibiao entity.With social networks, information network, quotation How effectively the network applications such as network, collaborative network, electronic commerce network, communication network and bio-networks emerge in multitude, Management and analysis diagram data have obtained the concern of more and more people.Wherein, figure cluster be used as a basic problem, by Research extensively.

Figure cluster is the cluster for diagram data.The purpose of figure cluster is exactly will be in network according to the similarity between vertex Vertex partition is several subgraphs intensively connected, and be otherwise known as cluster, so that the vertex connection in the same cluster is relatively dense, it is different Connection between cluster is relatively sparse.Cluster in real network represents the set of special object, and e.g., the cluster in community network represents The true public organization formed according to interest or background；Cluster in citation network represents the theory of correlation for being directed to same subject Text.It was found that the cluster in these networks, which facilitates us, more efficiently understands and develop these networks, such as it can capture quotation Strong online document is contacted in network, it is found that the social groups that social environment is shared in social networks, and analysis people exist Product often bought together when online shopping etc..

The figure clustering method of many types has been proposed now, but there are still following two aspects challenges:

First, identify bridge and abnormal vertex

Although detecting, cluster is critically important, and the bridge and abnormal vertex in detection figure are also critically important.In one drawing, each Vertex all plays different roles.Such as some people may with many groups are all very friendly is but not belonging to any one group Body (such as politicians), referred to as bridge.And some people and less people's dealing (such as recluse), referred to as abnormal vertex.Identify bridge It is particularly significant for the complex network different for excavation with abnormal vertex.In addition, there is also need for many figure clustering algorithms at present The problems such as inputting parameter, adjusting parameter, excessively high time complexity.

Second, Dynamic Graph and tracking cluster evolution event are handled in real time

A large amount of algorithm is defined according to respective different cluster proposes the method for capableing of automatic cluster.But at present mostly Counting method be all since one it is very strict assume, that is, real world can be modeled with static network.So And the network of many real worlds is to constantly update, over time, vertex and side can be added and leave net Network, to generate relevant variation in figure topology, cluster result can also change therewith.Existing figure clustering algorithm should Consider adaptation Dynamic Graph.But current figure clustering method is directed to static network mostly, seems power not when handling dynamic network From the heart.Furthermore traditional figure clustering algorithm is intended to identify hiding clustering architecture, and Dynamic Graph clustering method should then be dedicated to tracking The local topology of Dynamic Graph and its mutation, but few methods can accomplish this point at present.The variation of these local topologies and mutation Be otherwise known as cluster evolution event.Tracking cluster evolution event has important practical significance.Such as during Ictiobus cyprinllus, The social networks such as Facebook backstage carries out Dynamic Graph cluster, and person participating in the election, which can timely learn, supports each person participating in the election's Quantity, size and the careful personnel of cluster flow to dynamic etc., then can timely be adopted according to the dynamic change of real-time cluster Corresponding measure is taken to guarantee the political support of oneself.

Generally speaking, current figure clustering algorithm can not well solve the above two aspects problem, this is with regard to urgent need The challenge of Dynamic Graph clustering problem is met with new method and technology.

Summary of the invention

For overcome the deficiencies in the prior art, the present invention provides a kind of Dynamic Graph clustering algorithms based on density peak.

The method of the present invention includes following steps:

Step 1: structural similarity, used structural similarity formula are calculated to each pair of adjacent vertex in figure are as follows:

Wherein, N [u] indicates the structure neighborhood of vertex u, for a figure G (V, E), i.e. N [u]=v ∈ V | (u, v) ∈ E } ∪{u}.The structure neighborhood of N [v] expression vertex v.Deg [u] indicates that the degree of vertex u, the degree of vertex u are that u structure in vertex is adjacent The number in residence.The degree of deg [v] expression vertex v.

Descending arrangement is carried out to the structural similarity between all vertex later, is taken at structural similarity descending arrangement 20% Value be similarity threshold σ_t, the number on all sides in figure is indicated with m, bidding k indicates similarity threshold in structural similarity Corresponding structural similarity serial number after descending arrangement, then k should meet:

Step 2: successively calculating three measurements on each vertex in figure: local density, rely on vertex and rely on similarity.

Step 2-1: can basis of local density's formula as entire algorithm be the key that realize to realize cluster on the diagram Factor.This patent first has to the partial structurtes for considering vertex, defines any one vertex as structure-based figure clustering algorithm Local density include structural similarity between the vertex and its all structure neighborhood.Design serialization functionAnd using standardized normal distribution as μ_uvWeight in local density's formula.By μ_uvValue range be set as 0 < μ_uv≤ 2, to exclude to be unsatisfactory for the structural similarity of this value range.According to structural similarity, local density's calculation formula are as follows:

Step 2-2: local density ratio u in the neighbours vertex of vertex u is big and be known as with the highest vertex of u structural similarity The dependence vertex of u, is denoted asBy u withBetween structural similarity be known as rely on similarity, be denoted as δ_u.Calculation formula are as follows:

Wherein N (u) indicates the opening neighbours of vertex u, for a figure G (V, E), i.e. N (u)=v ∈ V | (u, v) ∈ E }. If setting δ there is no the vertex that local density's ratio u is big in the neighbours vertex of vertex u_u=0, andOne is pushed up Point u, if there are two even more dependence vertex, that algorithm will therefrom select the dependence vertex as vertex u at random.

Step 3: DP-Index index being established to entire figure according to the three of each vertex measurements, DP-Index index includes The local density on each vertex and each vertex in figure relies on vertex and relies on similarity, finally to the top in index Point carries out descending arrangement according to their local density.Based on DP-Index index, in this patent static map clustering algorithm when Between complexity be O (n), wherein n be vertex quantity.

Step 4: according to local density ρ defined in step 2-1 and 2-2 and similarity δ is relied on, using ρ as abscissa, δ is ordinate, is generated as the decision diagram of G- Design.Then local density is more than or equal to by ξ according to decision diagram and relies on similarity Vertex less than γ is selected into density summit point set, and the vertex by local density less than ξ is selected into noise vertex set.

Step 5: concentrating each vertex to distribute a cluster for density peak maximum first.Then for being not belonging to density peak maximum Each vertex of collection and noise vertex set, puts in order according to the descending of the local density on vertex and is traversed, and will be every A vertex is assigned to that local density in neighbours vertex is bigger than its, in cluster belonging to the highest vertex of structural similarity, finally obtains Cluster result set.

Step 6: in noise vertex set vertex carry out further division, if in noise vertex set certain vertex u neighbour Residence belongs to different clusters, then this vertex u is just selected into bridge vertex set, is otherwise just selected into abnormal vertex set.

Step 7: dependency graph being obtained according to DP-Index index, density summit point set and noise vertex set, is initialized first One dependency graph G ' (V ', E '), if vertex set V ' and side collection E ' is sky, later if a vertex u belongs in original image G (V, E) Vertex u, then be added in dependency graph G ' by density summit point set or noise vertex set, otherwise by this vertex u and side It is added in dependency graph G '.At this point, each connected component in dependency graph corresponds to a cluster, each isolated vertex is equal Belong to noise vertex.

Step 8: in the dynamic detection stage, considering four kinds of variations of Dynamic Graph: increasing or delete side and increase or deletion Vertex.Change real-time update DP-Index index respectively according to above-mentioned four class.

Increase and exist: when increasing when (u, v), the degree of vertex of vertex u and vertex v adds 1, then opposite vertexes u and vertex v into Row is further to update operation.For vertex u, recalculate the structural similarity of vertex u Yu neighbours vertex, more new summit u and The local density on neighbours vertex, later the dependence vertex on the neighbours vertex of more new summit u, neighbours vertex and neighbours vertex and Similarity is relied on, the measurement after finally changing according to vertex is updated DP-Index；The update of vertex v is operated with vertex u.

Delete: when deleting when (u, v), operation, which is similar to, increases side, and the degree of vertex of vertex u and vertex v subtracts 1, so Opposite vertexes u and vertex v carry out further update operation afterwards.For vertex u, the structure of vertex u Yu neighbours vertex are recalculated Similarity, the more local density of new summit u and neighbours vertex, later more new summit u, neighbours vertex and neighbours vertex neighbours The dependence vertex on vertex and dependence similarity, measurement is updated DP-Index after finally being changed according to vertex；Vertex v Operation is updated with vertex u.

Increase vertex: when increasing vertex u, initializing the local density of vertex u, rely on vertex and relies on similarity, and It is added in DP-Index index.

It deletes vertex: when deleting vertex u, deletion being executed to the side (u, v) between each vertex u and neighbours' vertex v Side operation, later deletes vertex u from DP-Index index.

Step 9: the variation occurred according to DP-Index index is updated dependency graph.The update of dependency graph mainly divides For following 5 kinds of situations:

Non-noise vertex becomes noise vertex: when non-noise vertex u becomes noise vertex, if existed in dependency graph SideThen delete side

Noise vertex becomes non-noise vertex: when noise vertex, u becomes non-noise vertex, if after vertex u variation not For density peak maximum, side is added in dependency graph

Density peak maximum becomes non-density peak maximum: when density peak maximum u becomes non-density peak maximum, if vertex u becomes It is not noise vertex after change, side is added in dependency graph

Non- density peak maximum becomes density peak maximum: if non-density peak maximum u is not noise vertex, vertex u becomes close When spending peak maximum, side is deleted in dependency graph

The dependence vertex on vertex changes: if vertex u is not noise vertex or density peak maximum, and vertex u according to Rely vertex fromBecomeSide is then deleted in dependency graphSide is added

Finally, can be obtained by real-time cluster result by obtaining the connected component in dependency graph；Connect in monitoring dependency graph The variation of reduction of fractions to a common denominator amount can obtain cluster evolution event.

Advantageous effects of the invention are as follows:

1) a kind of structure-based figure clustering algorithm is devised based on density peak thought, which does not need adjusting parameter, Precision is high, can obtain cluster, bridge vertex and abnormal vertex, there is theoretical guarantee and experiment to guarantee.

2) define three new measurements for each vertex: local density relies on vertex and dependence similarity.Based on this A little measurements can obtain the decision diagram exclusively for G- Design, can be obtained according to the demand of user by decision diagram certain amount of Cluster or automatic identification cluster number.

3) for the operation of accelerating algorithm, the present invention devises DP-Index index structure.It is quiet based on this index structure State structure chart clustering algorithm can efficiently calculate reasonable cluster online as a result, make Algorithms T-cbmplexity only with vertex quantity It is related.

4) in order to which algorithm is suitable for Dynamic Graph, the present invention devises the data structure of entitled dependency graph, and according to dependence Figure designs corresponding algorithm for the dynamic change of figure, obtains cluster result and cluster evolution event in real time.

Detailed description of the invention

Fig. 1 is the non-directed graph example of the specific embodiment of the invention；

Fig. 2 is the decision diagram example of the specific embodiment of the invention；

Fig. 3 is the dependency graph example of the specific embodiment of the invention；

Fig. 4 is the increase side (v of the specific embodiment of the invention₅, v₆) after non-directed graph example；

Fig. 5 is the increase side (v of the specific embodiment of the invention₅, v₆) after decision diagram example；

Fig. 6 is the increase side (v of the specific embodiment of the invention₅, v₆) after dependency graph example；

Fig. 7 is the cluster evolution event of the specific embodiment of the invention；

Fig. 8 is the algorithm flow chart of the specific embodiment of the invention；

V in figure₁-v₁₁For the vertex in figure.

Specific embodiment

The present invention is described further in conjunction with attached drawing.

The algorithm mainly includes that the static structure figure clustering algorithm based on density peak and the Dynamic Graph based on density peak are poly- Class algorithm two parts.Static structure figure clustering algorithm main thought be structural similarity based on vertex in figure and neighbours vertex with Structure Dependence first defines the local density on vertex in figure, relies on vertex and rely on similarity, and generates DP-Index rope Draw；Then decision diagram is generated, density peak maximum and noise vertex in figure are found by decision diagram；It is obtained according to density peak maximum The cluster result set of figure carries out noise vertex set to divide the abnormal vertex set of acquisition and bridge vertex set.Dynamic based on density peak The result building dependency graph and DP-Index index that figure clustering algorithm is clustered in initial phase according to Static Density peak figure first； Then according to the dynamic change incremental update DP-Index index and dependency graph of figure, according to the variation of dependency graph and dependency graph The cluster result and cluster evolution event of Dynamic Graph are obtained in real time.

The technical solution adopted by the present invention: firstly, it is similar to calculate the structure between any adjacent vertex in initial phase Degree, then calculate three kinds of measurements on each vertex: local density relies on vertex and relies on similarity, and to each vertex Above-mentioned three kinds of measurements establish DP-Index index, and carry out descending arrangement based on the local density on vertex to the vertex in index, By this index can effective boosting algorithm efficiency；The decision diagram exclusively for G- Design is generated according to DP-Index index, Selecting local density greatly and relying on the relatively small vertex of similarity is density peak maximum, and the lesser vertex of local density is noise Vertex；To remaining vertex, i.e., do not include the vertex set at density peak and noise vertex, is arranged according to the descending in DP-Index index Column sequence is traversed, by vertex and its dependence vertex partition into the same cluster, to obtain the cluster result set of figure；To making an uproar Sound vertex set, which divide, obtains abnormal vertex set and bridge vertex set；Dependency graph is created according to the cluster result of figure later, is used Cluster result is stored, updates and lays the foundation for dynamic；In the dynamic detection stage, according to inserting for the insertion/deletion on vertex and side Enter/delete to update DP-Index index and dependency graph, the cluster of figure is obtained according to the dynamic change of dependency graph and dependency graph And cluster evolution event as a result.

The present invention is divided into two stages --- initial phase and dynamic detection stage.Initial phase needs to realize three Aspect --- figure is clustered using density peak static structure figure clustering algorithm, construct DP-Index index structure, creation according to Lai Tu.Need to handle two aspects in the dynamic detection stage --- it updates DP-Index index and dependency graph, obtain cluster knot Fruit and cluster evolution event.Core algorithm therein is static structure figure clustering algorithm and use increment thought based on density peak The Dynamic Graph clustering algorithm of dynamic update is carried out to cluster result.

An example of the invention is illustrated using a undirected illustrated example in Fig. 1.Fig. 1 is one and calculates to test Method and the small diagram data designed.

The specific implementation step of the method for the present invention is as follows:

It is illustrated in figure 8 the flow chart of algorithm of the invention；

Wherein, N [u] indicates the structure neighborhood of vertex u, for a figure G (V, E), i.e. N [u]=v ∈ V | (u, v) ∈ E } ∪{u}.The structure neighborhood of N [v] expression vertex v.Deg [u] indicates the degree of vertex u, and the degree of vertex u is top in the method The number of point u structure neighborhood.The degree of deg [v] expression vertex v.As shown in Figure 1, vertex v₁And vertex v₂Structural similarity beThe structural similarity calculation method of other adjacent vertexes is same as described above.

Descending arrangement is carried out to structural similarity later, taking the value at structural similarity descending arrangement 20% is similarity threshold Value σ_t, similarity threshold σ_tIt is the default parameters of algorithm.The number on all sides in figure is indicated with m, bidding k indicates similarity threshold It is worth corresponding structural similarity serial number after the arrangement of structural similarity descending, then k should meet:For Non-directed graph in Fig. 1, σ_t=0.866.

Step 2-1: can basis of local density's formula as entire algorithm be the key that realize to realize cluster on the diagram Factor.The local density on one vertex of this patent design includes the structural similarity on vertex Yu all structure neighborhoods in vertex.In order to More reasonably distinguish the local density on each vertex, Patent design serialization functionAnd by standard normal Distribution is used as μ_uvWeight in local density's formula.Furthermore this patent is by μ_uvValue range be set as 0 < μ_uv≤ 2, to exclude Influence of the too small similarity to local density.According to structural similarity, local density's calculation formula are as follows:

Here, N (u) indicates the opening neighbours of vertex u, for a figure G (V, E), i.e. N (u)=v ∈ V | (u, v) ∈ E}.If setting δ there is no the vertex that local density's ratio u is big in the neighbours vertex of vertex u_u=0, andFor one Vertex u, if there are two even more dependence vertex, that algorithm will therefrom select the dependence top as vertex u at random Point.

Step 3: DP-Index index being established to entire figure according to the three of each vertex measurements, DP-Index index includes The local density on each vertex and each vertex in figure relies on vertex and relies on similarity, finally to the top in index Point carries out descending arrangement according to their local density.The index structure of DP-Index is as shown in table 1.Based on DP-Index rope Draw, the time complexity of static map clustering algorithm is O (n) in this patent, and wherein n is the quantity on vertex.It is, the time is complicated Degree is only related with vertex quantity, to greatly speed up the efficiency of algorithm.

1. DP-Index index structure of table

Step 4: according to local density ρ defined in step 2-1 and 2-2 and similarity δ is relied on, using ρ as abscissa, δ is ordinate, generates the decision diagram in this patent exclusively for G- Design, and the corresponding decision diagram of Fig. 1 is as shown in Figure 2.If ξ is 1.5, γ is 0.4.Then local density is more than or equal to by ξ according to decision diagram and relies on vertex of the similarity less than γ and be selected into density summit Point set, the vertex by local density less than ξ are selected into noise vertex set.Then density summit point set includes { v₆, v₁₀, noise vertex Collection includes { v₅, v₇, v₁₁}。

Step 5: concentrating each vertex to distribute a cluster for density peak maximum first, so the cluster result set of Fig. 1 includes 2 Cluster, current cluster result set are C={ { v₆, { v₁₀}}.Then for being not belonging to density summit point set and noise vertex set Each vertex puts in order according to the descending of the local density on vertex and is traversed, and each vertex is assigned to neighbours top Dian Zhong local density is bigger than its, (also just relies in cluster belonging to vertex) in cluster belonging to the highest vertex of structural similarity, most Cluster result set C={ { v is obtained eventually₁, v₂, v₃, v₄, v₆, { v₈, v₉, v₁₀}}。

Step 6: in noise vertex set vertex carry out further division, if in noise vertex set certain vertex u neighbour Residence belongs to two and otherwise more than two different clusters are just selected into different then this vertex u is just selected into bridge vertex set Normal vertex set.Because of v₇Neighbours vertex belong to two different clusters, so v₇It is bridge vertex.Because of v₅And v₁₁All only one Neighbours, so v₅And v₁₁It is abnormal vertex.

Step 7: dependency graph being obtained according to DP-Index index, density summit point set and noise vertex set, is initialized first One dependency graph G ' (V ', E '), if vertex set V ' and side collection E ' is sky, later if a vertex u belongs in original image G (V, E) Vertex u, then be added in dependency graph G ' by density summit point set or noise vertex set, otherwise by this vertex u and sideIt is added in dependency graph G ', the corresponding dependency graph of Fig. 1 is as shown in Figure 3.At this point, each connection point in Fig. 3 dependency graph Amount all corresponds to a cluster, such as vertex v₁、v₂、v₃、v₄、v₆The connected component at place is a cluster of cluster result, v₈、v₉、 v₁₀The connected component at place is another cluster of cluster result.Each isolated vertex, such as vertex v₅, v₇, v₁₁, belong to In noise vertex.The dependency graph designed in this patent is by the foundation structure in clustering as Dynamic Graph, for based on the quiet of density peak State structure chart clustering algorithm obtains cluster result in real time in Dynamic Graph and tracking cluster evolution event provides bridge.

Step 8: in the dynamic detection stage, this patent considers four kinds of Dynamic Graph variations: increase/deletion side and increase/ Delete vertex.Change real-time update DP-Index index respectively according to above-mentioned four class.

Increase and exist: when increasing when (u, v), the degree of vertex of vertex u and vertex v adds 1, then opposite vertexes u and vertex v into Row is further to update operation.For vertex u, recalculate the structural similarity of vertex u Yu neighbours vertex, more new summit u and The local density on neighbours vertex, later the dependence vertex on the neighbours vertex of more new summit u, neighbours vertex and neighbours vertex and Similarity is relied on, the measurement after finally changing according to vertex is updated DP-Index；The update of vertex v is operated with vertex u. Such as increase side (v₅, v₆), the figure after Fig. 1 variation is as shown in Figure 4.For vertex v₅For, v is affected first₅With neighbours Vertex v₂, v₆Between structural similarity, and then affect v₅With neighbours' vertex v₂, v₆Local density, so to v₅It is pushed up with neighbours Point v₂, v₆Local density be updated.It may influence to rely on vertex and dependence similarity since local density changes, then It needs to update v₅With neighbours' vertex v₂, v₆It relies on vertex and relies on similarity.Then because of v₂, v₆Local density change can It can influence v₂、v₆Neighbours' vertex v₁、v₃、v₄、v₇Dependence vertex and rely on similarity, so to v₁、v₃、v₄、v₇Dependence top Point is updated with similarity is relied on.Finally DP-Index index is updated, and is arranged by the descending of local density.For Vertex v₆Processing and vertex v₅Similar, to front, updated vertex does not need then to update, it is only necessary to update v₁、v₃、v₄、 v₇Local density and v₈Dependence similarity and rely on vertex.DP-Index index after final updated is as shown in table 2.

Delete: when deleting when (u, v), operation, which is similar to, increases side, and the degree of vertex of vertex u and vertex v subtracts 1, so Opposite vertexes u and vertex v carry out further update operation afterwards.For vertex u, the structure of vertex u Yu neighbours vertex are recalculated Similarity, the more local density of new summit u and neighbours vertex, later more new summit u, neighbours vertex and neighbours vertex neighbours The dependence vertex on vertex and dependence similarity, measurement is updated DP-Index after finally being changed according to vertex；Vertex v Operation is updated with vertex u.Such as side (v is deleted in Fig. 4₅, v₆), the figure after changing is as shown in Figure 1.For vertex v₅Come It says, affects v first₅With neighbours' vertex v₂, v₆Between structural similarity, and then affect v₅With neighbours' vertex v₂, v₆Part Density, so to v₅With neighbours' vertex v₂, v₆Local density be updated.Due to local density change may influence according to Rely vertex and rely on similarity, then needs to update v₅With neighbours' vertex v₂, v₆It relies on vertex and relies on similarity.Then because of v₂, v₆Local density change and may influence v₂、v₆Neighbours' vertex v₁、v₃、v₄、v₇Dependence vertex and rely on similarity, institute To v₁、v₃、v₄、v₇Dependence vertex with rely on similarity be updated.Finally DP-Index index is updated, and is pressed The descending of local density arranges.For vertex v₆Update and vertex v₅It is similar, the updated vertex in front is not needed then It updates, it is only necessary to update v₁、v₃、v₄、v₇Local density and v₈Dependence similarity and rely on vertex.After final updated DP-Index index is as shown in table 1.

2. DP-Index index structure of table

Increase vertex: when increasing vertex u, initializing the local density of vertex u, rely on vertex and relies on similarity, and It is added in DP-Index index.Such as increase vertex v in Fig. 1₁₂, because of v₁₂Structure neighborhood there was only own.It is then initial Change vertex u local density beI.e. 0.6872.Vertex will be relied on to be set asIt relies on similarity and is set as 0, then more New DP-Index index.DP-Index index after final updated is as shown in table 3.

3. DP-Index index structure of table

It deletes vertex: when deleting vertex u, deletion being executed to the side (u, v) between each vertex u and neighbours' vertex v Side operation, later deletes vertex u from DP-Index index.Such as vertex v is deleted in Fig. 1₅, then side (v is deleted first₅, v₂), specific update step is identical as the operation on above-mentioned deletion side.Then by vertex v₅It is deleted from index.After final updated DP-Index index is as shown in table 4.

4. DP-Index index structure of table

Step 9: the variation occurred according to DP-Index index is updated dependency graph.Algorithm first to local density, Similarity is relied on, the changed vertex in vertex is relied on and is judged according to the variation of decision diagram opposite vertexes type, then basis Different situations updates dependency graph.

Dependency graph is updated and is broadly divided into following 5 kinds of situations:

Non-noise vertex becomes noise vertex: when non-noise vertex u becomes noise vertex, if existed in dependency graph SideThen delete sideSuch as side (v is deleted in Fig. 1₄, v₆), so that v₄Local density become by 1.695 Become 1.078.The screening rule of decision diagram same Fig. 2, ξ 1.5, γ 0.4.According to decision diagram, v₄Local density be less than ξ, So v₄Noise vertex is become by non-noise vertex, then deletes side (v in dependency graph₄, v₃), wherein v₃It is v₄Former rely on top Point.

Noise vertex becomes non-noise vertex: when noise vertex, u becomes non-noise vertex, if after vertex u variation not For density peak maximum, side is added in dependency graphSuch as side (v is increased in Fig. 1₅, v₆), so that v₅Part it is close Degree becomes 1.639 by 1.079.Decision diagram after variation is as shown in figure 5, ξ is 1.5, γ 0.4.According to decision diagram, v₅Part Density is greater than ξ, relies on similarity and is greater than γ, so v₅Non-noise vertex is become by noise vertex and is not density peak maximum, then Increase side (v in dependency graph₅, v₂), wherein v₂It is v₅Dependence vertex.

Density peak maximum becomes non-density peak maximum: when density peak maximum u becomes non-density peak maximum, if vertex u becomes It is not noise vertex after change, side is added in dependency graphSuch as side (v is increased in Fig. 1₁₀, v₆), so that v₆Part Density is 2.794, and relying on similarity is 0.845, and dependence vertex is v₃.The screening rule of decision diagram same Fig. 2, ξ 1.5, γ are 0.4.According to decision diagram, v₆Local density be greater than ξ, rely on similarity be greater than γ, so becoming non-density by density peak maximum Peak maximum and not be noise vertex, therefore in dependency graph be added side (v₆, v₃)。

Non- density peak maximum becomes density peak maximum: if non-density peak maximum u is not noise vertex, vertex u becomes close When spending peak maximum, side is deleted in dependency graphSuch as side (v is deleted in Fig. 1₁₀, v₁₁), so that non-noise vertex v₈ Local density becomes 2.225, and relying on similarity is 0, relies on vertex and is.The screening rule of decision diagram same Fig. 2, ξ 1.5, γ are 0.4.According to decision diagram, v₈Local density be greater than ξ, rely on similarity be less than γ, density summit is become by non-density peak maximum Point, therefore side (v is deleted in dependency graph₈, v₁₀), v₁₀For v₈Former rely on vertex.

The dependence vertex on vertex changes: if vertex u is not noise vertex or density peak maximum, and vertex u according to Rely vertex fromBecomeSide is then deleted in dependency graphSide is addedSuch as side is increased in Fig. 1 (v₅, v₆), updated DP-Index index is as shown in table 2, wherein v₂Dependence vertex by v₃Become v₆.The screening of decision diagram is advised Then same Fig. 2, ξ 1.5, γ 0.4.According to decision diagram, vertex v₂It is not noise vertex or density peak maximum, so in dependency graph Middle deletion side (v₂, v₃), side (v is added₂, v₆)。

The connected component obtained in dependency graph can be obtained by real-time cluster result；Monitor the change of connected component in dependency graph Change can obtain cluster evolution event.Advantage of this is that, cluster evolution event can be obtained in real time while variation, without Need to obtain cluster result goes the difference of comparison cluster result that could obtain evolution event again later.Cluster evolution event such as Fig. 7 institute Show, cluster evolution event includes: the new life and extinction of cluster；The expansion and contraction of cluster, the fusion and division of cluster.Such as increase in Fig. 1 Side (v is added₅, v₆), during dependency graph becomes Fig. 6 from Fig. 3, increase side (v₂, v₅), the connected component in left side increases at this time Vertex v is added₅, because of v in Fig. 3₅Only one vertex of the connected component at place, so the cluster evolution event occurred at this time is cluster Expansion, and the vertex that cluster result includes by each connected component.Cluster result includes: that cluster result set is C={ { v₁, v₂, v₃, v₄, v₅, v₆, { v₈, v₉, v₁₀}}；For each remaining isolated vertex, because of v₇Neighbours vertex belong to two not Same cluster, so v₇It is bridge vertex；Because of v₁₁Only one neighbour, so v₁₁It is abnormal vertex.Dependency graph becomes Fig. 6 from Fig. 3 During, also delete side (v₂, v₃), and increase side (v₂, v₆), but since the vertex in final connected component does not have Variation, so cluster does not also change, there is no cluster evolution event occurs at this time.

By the processing of above-mentioned steps, being clustered based on density peak thought to Dynamic Graph for task has all been realized.This Provided static structure figure clustering algorithm is invented in clustering precision better than existing structure-based figure clustering algorithm, and The promotion of several times is obtained in efficiency.And this algorithm realizes the cluster to Dynamic Graph, can return to figure cluster result in real time, and Obtain cluster evolution event.

Claims

1. a kind of Dynamic Graph clustering method based on density peak, which comprises the following steps:

Wherein, N [u] indicates the structure neighborhood of vertex u, for a figure G (V, E), i.e. N [u]=v ∈ | and (u, v) ∈ E } ∪ { u }, N [v] indicate that the structure neighborhood of vertex v, deg [u] indicate that the degree of vertex u, the degree of vertex u are vertex u structure neighborhood Number, deg [v] indicate vertex v degree；

Descending arrangement is carried out to the structural similarity between all vertex later, takes the value at structural similarity descending arrangement 20% For similarity threshold σ_t, the number on all sides in figure is indicated with m, bidding k indicates similarity threshold in structural similarity descending Corresponding structural similarity serial number after arrangement, then k should meet:

Step 2: successively calculating three measurements on each vertex in figure: local density, rely on vertex and rely on similarity；

Step 2-1: basis of local density's formula as entire algorithm, be the key that can realize realize on the diagram cluster because Element, this patent first have to the partial structurtes for considering vertex, define any one vertex as structure-based figure clustering algorithm Local density includes the structural similarity between the vertex and its all structure neighborhood, designs serialization function And using standardized normal distribution as μ_uvWeight in local density's formula, by μ_uvValue range be set as 0 < μ_uv≤ 2, with row Except the structural similarity for being unsatisfactory for this value range, according to structural similarity, local density's calculation formula are as follows:

Step 2-2: local density ratio u in the neighbours vertex of vertex u is big and be known as u's with the highest vertex of u structural similarity Vertex is relied on, is denoted asBy u withBetween structural similarity be known as rely on similarity, be denoted as δ_u, calculation formula are as follows:

Wherein N (u) indicates the opening neighbours of vertex u, for a figure G (V, E), i.e. N (u)=v ∈ V | (u, v) ∈ E }, if There is no the vertex that local density's ratio u is big in the neighbours vertex of vertex u, then δ is set_u=0, andFor a vertex u, If there are two even more dependence vertex, that algorithm will therefrom select the dependence vertex as vertex u at random；

Step 3: DP-Index index being established to entire figure according to the three of each vertex measurements, DP-Index index includes in figure Each vertex and each vertex local density, rely on vertex and rely on similarity, finally to the vertex root in index Descending arrangement is carried out according to their local density, is based on DP-Index index, the time of static map clustering algorithm is multiple in this patent Miscellaneous degree is O (n), and wherein n is the quantity on vertex；

Step 4: according to local density ρ defined in step 2-1 and 2-2 and relying on similarity δ, using ρ as abscissa, δ is Ordinate is generated as the decision diagram of G- Design, and local density is then more than or equal to ξ according to decision diagram and dependence similarity is less than The vertex of γ is selected into density summit point set, and the vertex by local density less than ξ is selected into noise vertex set；

Step 5: be first that density peak maximum concentrates each vertex to distribute a cluster, then for be not belonging to density summit point set with And each vertex of noise vertex set, it puts in order and is traversed according to the descending of the local density on vertex, and by each top Point is assigned to that local density in neighbours vertex is bigger than its, in cluster belonging to the highest vertex of structural similarity, finally obtains cluster knot Fruit collection；

Step 6: further division being carried out to the vertex in noise vertex set, if the neighbours of certain vertex u belong in noise vertex set In different clusters, then this vertex u is just selected into bridge vertex set, it is otherwise just selected into abnormal vertex set；

Step 7: dependency graph being obtained according to DP-Index index, density summit point set and noise vertex set, initializes one first Dependency graph G ' (V ', E '), if vertex set V ' and side collection E ' is sky, later if a vertex u belongs to density in original image G (V, E) Vertex u, then be added in dependency graph G ' by summit point set or noise vertex set, otherwise by this vertex u and sideAdd Enter into dependency graph G '；At this point, each connected component in dependency graph corresponds to a cluster, each isolated vertex belongs to In noise vertex；

Step 8: in the dynamic detection stage, considers four kinds of variations of Dynamic Graph: increasing or delete side and increase or delete vertex, Change real-time update DP-Index index respectively according to above-mentioned four class；

Increase exist: when increase while (u, v) when, the degree of vertex of vertex u and vertex v adds 1, then opposite vertexes u and vertex v carry out into The update of one step operates, and for vertex u, recalculates the structural similarity of vertex u Yu neighbours vertex, more new summit u and neighbours The local density on vertex, later the dependence vertex on the neighbours vertex of more new summit u, neighbours vertex and neighbours vertex and dependence Similarity, the measurement after finally being changed according to vertex are updated DP-Index；The update of vertex v is operated with vertex u；

Delete: when deleting when (u, v), operation, which is similar to, increases side, and the degree of vertex of vertex u and vertex v subtracts 1, then right Vertex u and vertex v carry out further update and operate, and for vertex u, it is similar to the structure on neighbours vertex to recalculate vertex u It spends, the more local density of new summit u and neighbours vertex, later the neighbours vertex of more new summit u, neighbours vertex and neighbours vertex Dependence vertex and rely on similarity, after finally being changed according to vertex measurement DP-Index is updated；The update of vertex v Operation is the same as vertex u；

Increase vertex: when increasing vertex u, initializing the local density of vertex u, relies on vertex and rely on similarity, and be added Into DP-Index index；

It deletes vertex: when deleting vertex u, the side (u, v) between each vertex u and neighbours' vertex v being executed and deletes side behaviour Make, later deletes vertex u from DP-Index index；

Step 9: according to DP-Index index occur variation, dependency graph is updated, the update of dependency graph be broadly divided into Lower 5 kinds of situations:

Non-noise vertex becomes noise vertex: when non-noise vertex u becomes noise vertex, if there are sides in dependency graphThen delete side

Noise vertex becomes non-noise vertex: when noise vertex, u becomes non-noise vertex, if not being close after vertex u variation Peak maximum is spent, side is added in dependency graph

Density peak maximum becomes non-density peak maximum: when density peak maximum u becomes non-density peak maximum, if after vertex u variation It is not noise vertex, side is added in dependency graph

Non- density peak maximum becomes density peak maximum: if non-density peak maximum u is not noise vertex, vertex u becomes density peak When vertex, side is deleted in dependency graph

The dependence vertex on vertex changes: if vertex u is not noise vertex or density peak maximum, and the dependence top of vertex u Point fromBecomeSide is then deleted in dependency graphSide is added

Finally, can be obtained by real-time cluster result by obtaining the connected component in dependency graph；Monitor connection point in dependency graph The variation of amount can obtain cluster evolution event.