CN107392365A

CN107392365A - The maximizing influence method of independent cascade model based on propagation path analysis

Info

Publication number: CN107392365A
Application number: CN201710568222.7A
Authority: CN
Inventors: 刘维; 陈昕; 吴蔷梅
Original assignee: Yangzhou University
Current assignee: Yangzhou University
Priority date: 2017-07-11
Filing date: 2017-07-11
Publication date: 2017-11-24

Abstract

The present invention relates to the maximizing influence method for the independent cascade model analyzed based on propagation path.Present invention input complex network simultaneously determines the seed node of Initial travel, produce propagation path, m bar set of minimal paths before construction activation maximum probability, calculate the final activation probability of set of paths, node set is selected, passes through the seed set S for obtaining being capable of maximal cover set that the greedy algorithm of maximal cover is final.Instant invention overcomes lack to the environment around node deeper into meticulously understanding caused by it is not accurate enough, and time complexity it is high and the defects of be unable to sizable application.The present invention considers the preceding m bars shortest path between nodes, reduces many unnecessary calculating, can obtain optimal seed node set S using greed solution to the influence power node set found out.

Description

The maximizing influence method of independent cascade model based on propagation path analysis

Technical field

The invention belongs to the side applied to the node that independent cascade model identification maximizing influence is utilized in complex network Method, the maximizing influence method of the independent cascade model more particularly to based on propagation path analysis.

Background technology

The important node of complex network refers to that, compared to for network other nodes, network can be influenceed to a greater extent Some of structure and function special joints.In recent years, powerful node is identified by more and more extensive concern, not only due to Its great theoretical significance, more because its extensive actual application value.

With the development of internet, information is flooded with our daily life everywhere, search information makes us daily must The work done.Required information how is effectively found, or sees current hot news, except being searched by search engine Rope, the information that can also be issued by checking influential spreader in network.Many community networks, as twitter, Delicious allows user to exchange, and releases news.Identification influential user can effectively release news so that letter The propagation range of breath, depth all increase.The information of different Web Publishing is different, and operation way is also different, such as China Sina weibo or Tengxun's microblogging, such interaction platform, everyone can release news above, can be the concern of oneself Can also be current social hotspots, also there is interaction between user, the influence power of each user oneself is different, what it was issued The range that information is propagated, if can break out and (propagate on a large scale), the time of outburst, duration is all different.Have The user of stronger influence power, the information of issue can be forwarded by people quickly to be paid close attention to, and spread scope is wide, will soon turn into society's heat Point.Similarly, if it is desired to control public opinion, control outbreak of disease, can find out from node angle and propagate maximally effective section Point, take measures to accomplish effectively to control.So the identification of influence power node has highly important reason in complex network By the important research topic of meaning and real value.The algorithm for also having many identification influence power nodes at present is devised.Have Spend centrality (DC), close to centrality (CC) and betweenness center (BC) etc..

Before making the present invention, these methods are identifying powerful aspect existing defects and deficiency：Spend centrality (DC) Shortcoming is to only account for the most local information of node, is the description to node most direct influence, not to node around Environment (such as network site residing for node, higher order neighbours etc.) is carried out deeper into meticulously inquiring into, thus in many situations Under it is not accurate enough；Although the centrality of node is determined using the relative distance between all nodes pair close to centrality (CC), Applied widely in research, but time complexity is higher.It is not very practical in nowadays large-scale network environment；It is situated between Number centrality (BC) time complexity is higher, is not also to be readily applicable to large-scale network.

The content of the invention

Present invention aim to overcome drawbacks described above, there is provided the shadow of the independent cascade model based on propagation path analysis Ring power maximization approach.

The technical scheme is that：

The maximizing influence method of independent cascade model based on propagation path analysis, it is mainly characterized by bag Include following steps：

(1) input complex network and determine the seed node of Initial travel；

(2) propagation path is produced：Any summit in seed node activation network, if can activate can successfully be propagated with producing Path；

(3) m bar set of minimal paths before construction activation maximum probability：Maximum probability is obtained using signal source shortest path method Shortest path.But this method cascades (IC) model based on independent, in independent cascade model, we not only consider generally The maximum shortest path of rate, it is also contemplated that the preceding m paths of maximum probability；

(4) the final activation probability of set of paths is calculated：M bar set of minimal paths before maximum probability is obtained by step (3), Final activation probability is calculated in the calculation formula that propagation path is proposed according to us；

(5) node set is selected：By set must threshold value, the activation probability obtained using step (4), which is screened, to be produced New node set；

(6) maximal cover is asked to node set：By the greedy algorithm of maximal cover it is final obtain being capable of maximal cover The seed set S of set.

M bar set of minimal paths before step (4) the construction activation maximum probability：This method is in view of most general first While the shortest path of rate, it is contemplated that the preceding m bars shortest path of maximum probability between seed node u and vertex v, be not only single One using probability caused by a shortest path carries out research calculating, passes through grinding for the preceding m bars shortest path to maximum probability Study carefully and calculate, enable to the method that we are proposed to have shadow in network with more accuracy so as to efficient identify Ring the node of power.

The step (6) seeks maximal cover to node set：Saved by step (3), step (4) and step (5) Point set, maximal cover problem refer in seed set S that at least one element appears in step (5) and obtain in node set, The seed set S obtained by maximal cover algorithm is capable of the summit in maximum influence or activation complex network.

The advantages of the present invention are that complex web can be found out by proposing a kind of new propagation path model Preceding m bars shortest path in network from a given node to the maximum probability of other nodes, it has been to obtain from new propagation The seed set S of most possible activation nodes is found in path model, is tried to achieve finally using maximal cover greedy algorithm Seed set S.More accurate the invention enables prediction result, reliability is higher.Method proposed by the present invention only considers simultaneously Preceding m bars shortest path between nodes, so as to reduce many unnecessary calculating, and can be to finding out influence Power node set obtains optimal seed node set S using greed solution.The present invention can improve to be had an impact in identification network Efficiency in terms of power node, extend application and practicality of the technology in maximizing influence field.

Brief description of the drawings

Fig. 1 --- schematic flow sheet of the present invention.

Fig. 2 --- the present invention and the Kendall's correlations coefficient τ comparison schematic diagrams of other prior methods；Wherein a is in data Collect Kendall's correlations coefficient τ comparison schematic diagrams on USAir97, wherein b is that Kendall's correlations coefficient τ compares on data set PGP Schematic diagram, wherein c are the Kendall's correlations coefficient τ comparison schematic diagrams on data set Email.

Fig. 3 --- the present invention and the activation quantity comparison schematic diagram of other method；Wherein a is activated on data set PGP The comparison schematic diagram of quantity, a (1) are to activate quantitative comparison schematic diagram, a in data set this PPA of PGP methods and DC methods (2) it is to activate quantitative comparison schematic diagram in data set this PPA of PGP methods and CC methods, a (3) is in data set PGP sheets PPA methods are activating quantitative comparison schematic diagram with BC methods；Wherein b is to activate quantity on data set USAir97 to compare Schematic diagram, b (1) are to be in the quantitative comparison schematic diagram of activation, b (2) in data set this PPA of USAir97 methods and DC methods Quantitative comparison schematic diagram is being activated in data set this PPA of USAir97 methods and CC methods, b (3) is in data set This PPA of USAir97 methods are activating quantitative comparison schematic diagram with BC methods；Wherein c is to activate number on data set Email Comparison schematic diagram is measured, c (1) is activating quantitative comparison schematic diagram, c (2) in data set this PPA of Email methods and DC methods Quantitative comparison schematic diagram is being activated in data set this PPA of Email methods and CC methods, c (3) is in this PPA of data set Email Method is activating quantitative comparison schematic diagram with BC methods.

Embodiment

First, step describes

The present invention is described in detail with reference to the accompanying drawings and detailed description.

First input complex network and seed node：

Step (1) produces propagation path

If u is a seed node, by successfully have activated vertex v it, then just constitute a biography from u to v Path is broadcast, we are designated as l_uvIf：

l_uv=(e₁, e₂…e_m)

That is node u to v propagation path is by side e₁, e₂... e_mIt is formed by connecting, then we are definition node u Successful activation node v probability is：

Wherein P (e_i) it is side e_iOn activation probability.By that analogy, we have obtained initial propagation path.

M bar set of minimal paths before step (2) construction activation maximum probability

By node u to node v path more than one, it is path that we, which choose maximum of which probability,.This path It can be obtained by signal source shortest path algorithm.In every a line in G assign weights-In [p (e)] then ask by node u to The shortest path on other summits.This method is proposed based on independent cascade (IC) model, in independent cascade model, is not only examined The shortest path of maximum probability is considered, it is also contemplated that the preceding m paths of maximum probability, method are as follows：

If G₁=G, it is by the path of node u to node v maximum probability in G is schemedThen G is constructed₁, G₂... G_m：

In G_iIn find out node u to node v maximum probability path and be

Step (3) calculates the final activation probability of set of paths

This step is to utilize the figure G asked from step (2)_iSet, can be in G_iIn find out node u to node v probability most Big path isWe remember setFor the set being made up of the m paths of u to v maximum probability, then by u For seed node activate v probability be；

Step (4) selects node set

Iterated to calculate in this step by step (3), we have obtained P (L to (u, v) to each summit_uv) value i.e. by u for kind Child node activates v probability.In order to try to achieve the initial seed set S that most fixed points can be activated with maximum probability, we are set One threshold θ, method are as follows：

P (L (u, v))=0 is put for P (L (u, v)) ＜ θ

So, for each vertex v, can be gathered

R (v)=u | P (u, v) ＞ θ }

It is exactly most probable activation v vertex set.

Step (5) seeks maximal cover to node set

This step is exactly actually the seed vertex set R (V asked step (4)₁), R (V₂)…R(V_n) ask maximal cover to ask Topic.Set S is sought, and | S |=k so that S can cover R (V₁), R (V₂)…R(V_n) in most number set.

So-called S can cover R (V_i), refer to that at least one element appears in R (V in S_i) in, i.e. S ∩ R (V_i)≠φ。 We remember F_k(S) R (V can be covered for S₁), R (V₂)…R(V_n) in set number.

Finally export seed set S, can maximum effect whole network seed node set.

2nd, embodiment

Kendall's correlations coefficient τ compares

Fig. 2 is concentrated in illustrating in three actual data, and this PPA methods are with other three kinds of methods in Ken Deer phase relations Comparison in this kind of evaluation indexes of number τ.Prove that the inventive method is better than other three kinds of methods in feasibility etc..Wherein, Fig. 2 A, b, c are the data set PGP and Kendall's correlations coefficient τ compares on data set Email in data set USAir97 respectively Schematic diagram, a kind of maximizing influence method (PPA) for the independent cascade model that we analyze this method-based on propagation path Carried out with degree centrality (DC) existing before, close to centrality (CC) and betweenness center (BC) these three methods Contrast, can be with the visual and clear Kendall's correlations coefficient τ for showing this PPA methods value one from Fig. 2 three groups of lab diagrams Other these three methods are directly led over, this illustrates that the method feasibility effect that we are proposed is good.Although in Fig. 2 c figures, this Method is slightly below DC methods in Kendall's correlations coefficient τ at first, and when spread speed increase, this PPA methods are higher than again DC methods.Integrated comparative Fig. 2 a, b, tri- groups of figures of c, compare figure b and figure c, in b is schemed, by line chart it will be seen that The Kendall's correlations coefficient of CC modes is minimum, but in c is schemed, figure of discounting shows the Kendall's correlations coefficient of BC modes Minimum, the Kendall's correlations coefficient compared to the BC methods in b figures is higher than CC methods, in c figures, be entirely so it is opposite, Illustrate in the case of different pieces of information collection, BC and CC methods are not stable, and fluctuation is bigger.Think comparatively, with reference to Fig. 2 a, Tri- groups of figures of b, c, the inventive method are constantly in leading position, will not cause Kendall's correlations coefficient because of the conversion of data set Change, it was demonstrated that PPA methods are stable.In summary analyze, it has been found that the stability for the PPA methods that we are proposed and Feasibility is higher than other three kinds of methods.

Identification activation quantity compares

Fig. 3 is concentrated in illustrating in three actual data, and this PPA methods activate quantity with other three kinds of methods in identification On comparison, by the comparison of three groups of figures in Fig. 3, as a result prove, the inventive method, identification activation quantity on compared with other three kinds Method occupies advantage, shows, this method is correct feasible, and has higher identification accuracy.Wherein, a in Fig. 3, b, c points It is not the activation quantity comparison schematic diagram on data set PGP, data set USAir97 and data set Email.First, comprehensive ratio Compared with a, this three groups of figures of b, c, although the interstitial content that activated finally identified tends to a fixed numerical value, experimental result Show that this PPA methods identify that the vertex number activated is always higher than these three methods of DC, CC, BC, in Fig. 3 figure a (1) in, although on data set PGP before the t=3 moment, the identification activation nodes and DC methods of this method are about the same, But after the moment of t ＞ 3, the with the obvious advantage of this PPA methods shows.In Fig. 3 figure a (2), the advantages of this PPA methods from Start just to have shown at the time of propagation, this PPA methods hold a safe lead CC methods always.In Fig. 3 figure a (3) The identification number of the inventive method start time is almost consistent with BC methods always, but finally shows we at the moment of t ＞ 5 The advantage of method.The advantage of the inventive method can be clearly seen from Fig. 3 figure b (2) and figure b (2) this two groups of experimental result pictures Just shown in the t for starting to propagate, even last CC, BC both approaches and this PPA methods are all finally one Fixed value, but the line chart of the last activation quantity of this PPA methods is always above DC, CC, BC method, that is, table Bright, the activation quantity that this PPA methods can identify is other unnecessary three kinds of modes.Finally, the line chart in Fig. 3 figure c (3) The process directly perceived for showing a dynamic change, at the moment of t ＜ 3, the line chart of PPA methods is entirely so less than BC methods, that is, Say that the activation number that PPA methods identify is fewer than BC methods, because network is dynamic change, the dynamic change feelings at each moment Condition is unknown, but after the t=3 moment, the figure of discounting of this PPA methods is higher than the line chart of BC methods, and schemes in c (3) most In the number line chart activated afterwards, the number of the inventive method is above BC methods.In summary it is described, with reference to three groups of Fig. 3 Lab diagram, this PPA methods not only identify that the quantity for the node that is activated is more compared with other method, and it is correct to illustrate this method, is had Effect.And from identification be activated node accuracy rate (accuracy rate) compared with DC, CC, BC, it can be seen that this hair The method of bright offer can have higher recognition accuracy.

Claims

1. the maximizing influence method of the independent cascade model based on propagation path analysis, it is characterised in that following steps：

(1) seed node of Initial travel is determined in complex network；

(2) propagation path is produced：Arbitrary node in seed node activation network, if can activate successfully can propagate road to produce Footpath；

(3) m bar set of minimal paths before construction activation maximum probability：This method cascades the proposition of (IC) model based on independent, In independent cascade model, the shortest path of maximum probability is not only considered, it is also contemplated that the preceding m paths of maximum probability；

(4) the final activation probability of set of paths is calculated：It is calculated according to given calculation formula；

(5) node set is selected：Set must threshold value, obtained activation probability produces new node set；

(6) maximal cover is asked to node set：By the greedy algorithm of maximal cover it is final obtain being capable of maximal cover set Seed set S.

2. the maximizing influence method of the independent cascade model according to claim 1 based on propagation path analysis, its It is characterised by that the step (4) construction activates m bar set of minimal paths before maximum probability：This method is in view of most general first While the shortest path of rate, it is contemplated that the preceding m bars shortest path of maximum probability between seed node u and vertex v, be not only single One using probability caused by a shortest path carries out research calculating, passes through grinding for the preceding m bars shortest path to maximum probability Study carefully and calculate, enable to the method that we are proposed to have shadow in network with more accuracy so as to efficient identify Ring the node of power.

3. the maximizing influence method of the independent cascade model according to claim 1 based on propagation path analysis, its It is characterised by that the step (6) seeks maximal cover to node set：Saved by step (3), step (4) and step (5) Point set, maximal cover problem refer in seed set S that at least one element appears in step (5) and obtain in node set, The seed set S obtained by maximal cover algorithm is capable of the summit in maximum influence or activation complex network.