CN107392365A - The maximizing influence method of independent cascade model based on propagation path analysis - Google Patents

The maximizing influence method of independent cascade model based on propagation path analysis Download PDF

Info

Publication number
CN107392365A
CN107392365A CN201710568222.7A CN201710568222A CN107392365A CN 107392365 A CN107392365 A CN 107392365A CN 201710568222 A CN201710568222 A CN 201710568222A CN 107392365 A CN107392365 A CN 107392365A
Authority
CN
China
Prior art keywords
node
seed
activation
probability
propagation path
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710568222.7A
Other languages
Chinese (zh)
Inventor
刘维
陈昕
吴蔷梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangzhou University
Original Assignee
Yangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangzhou University filed Critical Yangzhou University
Priority to CN201710568222.7A priority Critical patent/CN107392365A/en
Publication of CN107392365A publication Critical patent/CN107392365A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • G06Q10/047Optimisation of routes or paths, e.g. travelling salesman problem
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The present invention relates to the maximizing influence method for the independent cascade model analyzed based on propagation path.Present invention input complex network simultaneously determines the seed node of Initial travel, produce propagation path, m bar set of minimal paths before construction activation maximum probability, calculate the final activation probability of set of paths, node set is selected, passes through the seed set S for obtaining being capable of maximal cover set that the greedy algorithm of maximal cover is final.Instant invention overcomes lack to the environment around node deeper into meticulously understanding caused by it is not accurate enough, and time complexity it is high and the defects of be unable to sizable application.The present invention considers the preceding m bars shortest path between nodes, reduces many unnecessary calculating, can obtain optimal seed node set S using greed solution to the influence power node set found out.

Description

The maximizing influence method of independent cascade model based on propagation path analysis
Technical field
The invention belongs to the side applied to the node that independent cascade model identification maximizing influence is utilized in complex network Method, the maximizing influence method of the independent cascade model more particularly to based on propagation path analysis.
Background technology
The important node of complex network refers to that, compared to for network other nodes, network can be influenceed to a greater extent Some of structure and function special joints.In recent years, powerful node is identified by more and more extensive concern, not only due to Its great theoretical significance, more because its extensive actual application value.
With the development of internet, information is flooded with our daily life everywhere, search information makes us daily must The work done.Required information how is effectively found, or sees current hot news, except being searched by search engine Rope, the information that can also be issued by checking influential spreader in network.Many community networks, as twitter, Delicious allows user to exchange, and releases news.Identification influential user can effectively release news so that letter The propagation range of breath, depth all increase.The information of different Web Publishing is different, and operation way is also different, such as China Sina weibo or Tengxun's microblogging, such interaction platform, everyone can release news above, can be the concern of oneself Can also be current social hotspots, also there is interaction between user, the influence power of each user oneself is different, what it was issued The range that information is propagated, if can break out and (propagate on a large scale), the time of outburst, duration is all different.Have The user of stronger influence power, the information of issue can be forwarded by people quickly to be paid close attention to, and spread scope is wide, will soon turn into society's heat Point.Similarly, if it is desired to control public opinion, control outbreak of disease, can find out from node angle and propagate maximally effective section Point, take measures to accomplish effectively to control.So the identification of influence power node has highly important reason in complex network By the important research topic of meaning and real value.The algorithm for also having many identification influence power nodes at present is devised.Have Spend centrality (DC), close to centrality (CC) and betweenness center (BC) etc..
Before making the present invention, these methods are identifying powerful aspect existing defects and deficiency:Spend centrality (DC) Shortcoming is to only account for the most local information of node, is the description to node most direct influence, not to node around Environment (such as network site residing for node, higher order neighbours etc.) is carried out deeper into meticulously inquiring into, thus in many situations Under it is not accurate enough;Although the centrality of node is determined using the relative distance between all nodes pair close to centrality (CC), Applied widely in research, but time complexity is higher.It is not very practical in nowadays large-scale network environment;It is situated between Number centrality (BC) time complexity is higher, is not also to be readily applicable to large-scale network.
The content of the invention
Present invention aim to overcome drawbacks described above, there is provided the shadow of the independent cascade model based on propagation path analysis Ring power maximization approach.
The technical scheme is that:
The maximizing influence method of independent cascade model based on propagation path analysis, it is mainly characterized by bag Include following steps:
(1) input complex network and determine the seed node of Initial travel;
(2) propagation path is produced:Any summit in seed node activation network, if can activate can successfully be propagated with producing Path;
(3) m bar set of minimal paths before construction activation maximum probability:Maximum probability is obtained using signal source shortest path method Shortest path.But this method cascades (IC) model based on independent, in independent cascade model, we not only consider generally The maximum shortest path of rate, it is also contemplated that the preceding m paths of maximum probability;
(4) the final activation probability of set of paths is calculated:M bar set of minimal paths before maximum probability is obtained by step (3), Final activation probability is calculated in the calculation formula that propagation path is proposed according to us;
(5) node set is selected:By set must threshold value, the activation probability obtained using step (4), which is screened, to be produced New node set;
(6) maximal cover is asked to node set:By the greedy algorithm of maximal cover it is final obtain being capable of maximal cover The seed set S of set.
M bar set of minimal paths before step (4) the construction activation maximum probability:This method is in view of most general first While the shortest path of rate, it is contemplated that the preceding m bars shortest path of maximum probability between seed node u and vertex v, be not only single One using probability caused by a shortest path carries out research calculating, passes through grinding for the preceding m bars shortest path to maximum probability Study carefully and calculate, enable to the method that we are proposed to have shadow in network with more accuracy so as to efficient identify Ring the node of power.
The step (6) seeks maximal cover to node set:Saved by step (3), step (4) and step (5) Point set, maximal cover problem refer in seed set S that at least one element appears in step (5) and obtain in node set, The seed set S obtained by maximal cover algorithm is capable of the summit in maximum influence or activation complex network.
The advantages of the present invention are that complex web can be found out by proposing a kind of new propagation path model Preceding m bars shortest path in network from a given node to the maximum probability of other nodes, it has been to obtain from new propagation The seed set S of most possible activation nodes is found in path model, is tried to achieve finally using maximal cover greedy algorithm Seed set S.More accurate the invention enables prediction result, reliability is higher.Method proposed by the present invention only considers simultaneously Preceding m bars shortest path between nodes, so as to reduce many unnecessary calculating, and can be to finding out influence Power node set obtains optimal seed node set S using greed solution.The present invention can improve to be had an impact in identification network Efficiency in terms of power node, extend application and practicality of the technology in maximizing influence field.
Brief description of the drawings
Fig. 1 --- schematic flow sheet of the present invention.
Fig. 2 --- the present invention and the Kendall's correlations coefficient τ comparison schematic diagrams of other prior methods;Wherein a is in data Collect Kendall's correlations coefficient τ comparison schematic diagrams on USAir97, wherein b is that Kendall's correlations coefficient τ compares on data set PGP Schematic diagram, wherein c are the Kendall's correlations coefficient τ comparison schematic diagrams on data set Email.
Fig. 3 --- the present invention and the activation quantity comparison schematic diagram of other method;Wherein a is activated on data set PGP The comparison schematic diagram of quantity, a (1) are to activate quantitative comparison schematic diagram, a in data set this PPA of PGP methods and DC methods (2) it is to activate quantitative comparison schematic diagram in data set this PPA of PGP methods and CC methods, a (3) is in data set PGP sheets PPA methods are activating quantitative comparison schematic diagram with BC methods;Wherein b is to activate quantity on data set USAir97 to compare Schematic diagram, b (1) are to be in the quantitative comparison schematic diagram of activation, b (2) in data set this PPA of USAir97 methods and DC methods Quantitative comparison schematic diagram is being activated in data set this PPA of USAir97 methods and CC methods, b (3) is in data set This PPA of USAir97 methods are activating quantitative comparison schematic diagram with BC methods;Wherein c is to activate number on data set Email Comparison schematic diagram is measured, c (1) is activating quantitative comparison schematic diagram, c (2) in data set this PPA of Email methods and DC methods Quantitative comparison schematic diagram is being activated in data set this PPA of Email methods and CC methods, c (3) is in this PPA of data set Email Method is activating quantitative comparison schematic diagram with BC methods.
Embodiment
First, step describes
The present invention is described in detail with reference to the accompanying drawings and detailed description.
First input complex network and seed node:
Step (1) produces propagation path
If u is a seed node, by successfully have activated vertex v it, then just constitute a biography from u to v Path is broadcast, we are designated as luvIf:
luv=(e1, e2…em)
That is node u to v propagation path is by side e1, e2... emIt is formed by connecting, then we are definition node u Successful activation node v probability is:
Wherein P (ei) it is side eiOn activation probability.By that analogy, we have obtained initial propagation path.
M bar set of minimal paths before step (2) construction activation maximum probability
By node u to node v path more than one, it is path that we, which choose maximum of which probability,.This path It can be obtained by signal source shortest path algorithm.In every a line in G assign weights-In [p (e)] then ask by node u to The shortest path on other summits.This method is proposed based on independent cascade (IC) model, in independent cascade model, is not only examined The shortest path of maximum probability is considered, it is also contemplated that the preceding m paths of maximum probability, method are as follows:
If G1=G, it is by the path of node u to node v maximum probability in G is schemedThen G is constructed1, G2... Gm
In GiIn find out node u to node v maximum probability path and be
Step (3) calculates the final activation probability of set of paths
This step is to utilize the figure G asked from step (2)iSet, can be in GiIn find out node u to node v probability most Big path isWe remember setFor the set being made up of the m paths of u to v maximum probability, then by u For seed node activate v probability be;
Step (4) selects node set
Iterated to calculate in this step by step (3), we have obtained P (L to (u, v) to each summituv) value i.e. by u for kind Child node activates v probability.In order to try to achieve the initial seed set S that most fixed points can be activated with maximum probability, we are set One threshold θ, method are as follows:
P (L (u, v))=0 is put for P (L (u, v)) < θ
So, for each vertex v, can be gathered
R (v)=u | P (u, v) > θ }
It is exactly most probable activation v vertex set.
Step (5) seeks maximal cover to node set
This step is exactly actually the seed vertex set R (V asked step (4)1), R (V2)…R(Vn) ask maximal cover to ask Topic.Set S is sought, and | S |=k so that S can cover R (V1), R (V2)…R(Vn) in most number set.
So-called S can cover R (Vi), refer to that at least one element appears in R (V in Si) in, i.e. S ∩ R (Vi)≠φ。 We remember Fk(S) R (V can be covered for S1), R (V2)…R(Vn) in set number.
Finally export seed set S, can maximum effect whole network seed node set.
2nd, embodiment
Kendall's correlations coefficient τ compares
Fig. 2 is concentrated in illustrating in three actual data, and this PPA methods are with other three kinds of methods in Ken Deer phase relations Comparison in this kind of evaluation indexes of number τ.Prove that the inventive method is better than other three kinds of methods in feasibility etc..Wherein, Fig. 2 A, b, c are the data set PGP and Kendall's correlations coefficient τ compares on data set Email in data set USAir97 respectively Schematic diagram, a kind of maximizing influence method (PPA) for the independent cascade model that we analyze this method-based on propagation path Carried out with degree centrality (DC) existing before, close to centrality (CC) and betweenness center (BC) these three methods Contrast, can be with the visual and clear Kendall's correlations coefficient τ for showing this PPA methods value one from Fig. 2 three groups of lab diagrams Other these three methods are directly led over, this illustrates that the method feasibility effect that we are proposed is good.Although in Fig. 2 c figures, this Method is slightly below DC methods in Kendall's correlations coefficient τ at first, and when spread speed increase, this PPA methods are higher than again DC methods.Integrated comparative Fig. 2 a, b, tri- groups of figures of c, compare figure b and figure c, in b is schemed, by line chart it will be seen that The Kendall's correlations coefficient of CC modes is minimum, but in c is schemed, figure of discounting shows the Kendall's correlations coefficient of BC modes Minimum, the Kendall's correlations coefficient compared to the BC methods in b figures is higher than CC methods, in c figures, be entirely so it is opposite, Illustrate in the case of different pieces of information collection, BC and CC methods are not stable, and fluctuation is bigger.Think comparatively, with reference to Fig. 2 a, Tri- groups of figures of b, c, the inventive method are constantly in leading position, will not cause Kendall's correlations coefficient because of the conversion of data set Change, it was demonstrated that PPA methods are stable.In summary analyze, it has been found that the stability for the PPA methods that we are proposed and Feasibility is higher than other three kinds of methods.
Identification activation quantity compares
Fig. 3 is concentrated in illustrating in three actual data, and this PPA methods activate quantity with other three kinds of methods in identification On comparison, by the comparison of three groups of figures in Fig. 3, as a result prove, the inventive method, identification activation quantity on compared with other three kinds Method occupies advantage, shows, this method is correct feasible, and has higher identification accuracy.Wherein, a in Fig. 3, b, c points It is not the activation quantity comparison schematic diagram on data set PGP, data set USAir97 and data set Email.First, comprehensive ratio Compared with a, this three groups of figures of b, c, although the interstitial content that activated finally identified tends to a fixed numerical value, experimental result Show that this PPA methods identify that the vertex number activated is always higher than these three methods of DC, CC, BC, in Fig. 3 figure a (1) in, although on data set PGP before the t=3 moment, the identification activation nodes and DC methods of this method are about the same, But after the moment of t > 3, the with the obvious advantage of this PPA methods shows.In Fig. 3 figure a (2), the advantages of this PPA methods from Start just to have shown at the time of propagation, this PPA methods hold a safe lead CC methods always.In Fig. 3 figure a (3) The identification number of the inventive method start time is almost consistent with BC methods always, but finally shows we at the moment of t > 5 The advantage of method.The advantage of the inventive method can be clearly seen from Fig. 3 figure b (2) and figure b (2) this two groups of experimental result pictures Just shown in the t for starting to propagate, even last CC, BC both approaches and this PPA methods are all finally one Fixed value, but the line chart of the last activation quantity of this PPA methods is always above DC, CC, BC method, that is, table Bright, the activation quantity that this PPA methods can identify is other unnecessary three kinds of modes.Finally, the line chart in Fig. 3 figure c (3) The process directly perceived for showing a dynamic change, at the moment of t < 3, the line chart of PPA methods is entirely so less than BC methods, that is, Say that the activation number that PPA methods identify is fewer than BC methods, because network is dynamic change, the dynamic change feelings at each moment Condition is unknown, but after the t=3 moment, the figure of discounting of this PPA methods is higher than the line chart of BC methods, and schemes in c (3) most In the number line chart activated afterwards, the number of the inventive method is above BC methods.In summary it is described, with reference to three groups of Fig. 3 Lab diagram, this PPA methods not only identify that the quantity for the node that is activated is more compared with other method, and it is correct to illustrate this method, is had Effect.And from identification be activated node accuracy rate (accuracy rate) compared with DC, CC, BC, it can be seen that this hair The method of bright offer can have higher recognition accuracy.

Claims (3)

1. the maximizing influence method of the independent cascade model based on propagation path analysis, it is characterised in that following steps:
(1) seed node of Initial travel is determined in complex network;
(2) propagation path is produced:Arbitrary node in seed node activation network, if can activate successfully can propagate road to produce Footpath;
(3) m bar set of minimal paths before construction activation maximum probability:This method cascades the proposition of (IC) model based on independent, In independent cascade model, the shortest path of maximum probability is not only considered, it is also contemplated that the preceding m paths of maximum probability;
(4) the final activation probability of set of paths is calculated:It is calculated according to given calculation formula;
(5) node set is selected:Set must threshold value, obtained activation probability produces new node set;
(6) maximal cover is asked to node set:By the greedy algorithm of maximal cover it is final obtain being capable of maximal cover set Seed set S.
2. the maximizing influence method of the independent cascade model according to claim 1 based on propagation path analysis, its It is characterised by that the step (4) construction activates m bar set of minimal paths before maximum probability:This method is in view of most general first While the shortest path of rate, it is contemplated that the preceding m bars shortest path of maximum probability between seed node u and vertex v, be not only single One using probability caused by a shortest path carries out research calculating, passes through grinding for the preceding m bars shortest path to maximum probability Study carefully and calculate, enable to the method that we are proposed to have shadow in network with more accuracy so as to efficient identify Ring the node of power.
3. the maximizing influence method of the independent cascade model according to claim 1 based on propagation path analysis, its It is characterised by that the step (6) seeks maximal cover to node set:Saved by step (3), step (4) and step (5) Point set, maximal cover problem refer in seed set S that at least one element appears in step (5) and obtain in node set, The seed set S obtained by maximal cover algorithm is capable of the summit in maximum influence or activation complex network.
CN201710568222.7A 2017-07-11 2017-07-11 The maximizing influence method of independent cascade model based on propagation path analysis Pending CN107392365A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710568222.7A CN107392365A (en) 2017-07-11 2017-07-11 The maximizing influence method of independent cascade model based on propagation path analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710568222.7A CN107392365A (en) 2017-07-11 2017-07-11 The maximizing influence method of independent cascade model based on propagation path analysis

Publications (1)

Publication Number Publication Date
CN107392365A true CN107392365A (en) 2017-11-24

Family

ID=60340566

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710568222.7A Pending CN107392365A (en) 2017-07-11 2017-07-11 The maximizing influence method of independent cascade model based on propagation path analysis

Country Status (1)

Country Link
CN (1) CN107392365A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549632A (en) * 2018-04-03 2018-09-18 重庆邮电大学 A kind of social network influence power propagation model construction method based on sentiment analysis
CN111695043A (en) * 2020-06-16 2020-09-22 桂林电子科技大学 Social network blocking influence maximization method based on geographic area
CN111835537A (en) * 2019-04-17 2020-10-27 中国移动通信集团山西有限公司 Method, device and equipment for identifying nodes in communication network cascade fault
CN113378470A (en) * 2021-06-22 2021-09-10 常熟理工学院 Time sequence network-oriented influence maximization method and system
CN114640643A (en) * 2022-02-21 2022-06-17 华南理工大学 Information cross-community propagation maximization method and system based on group intelligence

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549632A (en) * 2018-04-03 2018-09-18 重庆邮电大学 A kind of social network influence power propagation model construction method based on sentiment analysis
CN108549632B (en) * 2018-04-03 2022-02-11 重庆邮电大学 Social network influence propagation model construction method based on emotion analysis
CN111835537A (en) * 2019-04-17 2020-10-27 中国移动通信集团山西有限公司 Method, device and equipment for identifying nodes in communication network cascade fault
CN111835537B (en) * 2019-04-17 2022-11-29 中国移动通信集团山西有限公司 Method, device and equipment for identifying nodes in communication network cascade fault
CN111695043A (en) * 2020-06-16 2020-09-22 桂林电子科技大学 Social network blocking influence maximization method based on geographic area
CN113378470A (en) * 2021-06-22 2021-09-10 常熟理工学院 Time sequence network-oriented influence maximization method and system
CN114640643A (en) * 2022-02-21 2022-06-17 华南理工大学 Information cross-community propagation maximization method and system based on group intelligence
CN114640643B (en) * 2022-02-21 2023-11-21 华南理工大学 Information cross-community propagation maximization method and system based on group intelligence

Similar Documents

Publication Publication Date Title
CN107392365A (en) The maximizing influence method of independent cascade model based on propagation path analysis
Jiang et al. Identifying propagation sources in networks: State-of-the-art and comparative studies
Narayanam et al. A shapley value-based approach to discover influential nodes in social networks
Basaras et al. Detecting influential spreaders in complex, dynamic networks
Cui et al. Malicious URL detection with feature extraction based on machine learning
Chang et al. Internet connectivity at the AS-level: an optimization-driven modeling approach
US20130124505A1 (en) Methods for integrating semantic search, query, and analysis across heterogeneous data types and devices thereof
TW201513019A (en) Method and system for extracting user behavior features to personalize recommendations
CN110519298A (en) A kind of Tor method for recognizing flux and device based on machine learning
US20170053031A1 (en) Information forecast and acquisition method based on webpage link parameter analysis
WO2013026325A1 (en) Person search method, device, and storage medium
CN103136330A (en) User reliability assessment method based on microblog platforms
Sun et al. Hgdom: Heterogeneous graph convolutional networks for malicious domain detection
CN104615627A (en) Event public sentiment information extracting method and system based on micro-blog platform
Zhao et al. Malicious domain names detection algorithm based on lexical analysis and feature quantification
CN105915399A (en) Network risk source tracing method based on back propagation
He et al. Malicious domain detection via domain relationship and graph models
Baek et al. Clustering-based label estimation for network anomaly detection
Wang et al. Using intuitionistic fuzzy set for anomaly detection of network traffic from flow interaction
Huang et al. Learning bi-directional social influence in information cascades using graph sequence attention networks
An et al. A novel HTTP anomaly detection framework based on edge intelligence for the Internet of Things (IoT)
CN103841121B (en) A kind of comment and interaction systems and method based on local file
Qin et al. MUCM: multilevel user cluster mining based on behavior profiles for network monitoring
CN109241483B (en) Website discovery method and system based on domain name recommendation
Adeleye et al. A fitness-based evolving network for web-apis discovery

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20171124