CN113553541B - Independent path analysis-based source positioning method under independent cascade model - Google Patents

Independent path analysis-based source positioning method under independent cascade model Download PDF

Info

Publication number
CN113553541B
CN113553541B CN202110623693.XA CN202110623693A CN113553541B CN 113553541 B CN113553541 B CN 113553541B CN 202110623693 A CN202110623693 A CN 202110623693A CN 113553541 B CN113553541 B CN 113553541B
Authority
CN
China
Prior art keywords
source
node
nodes
observation
independent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110623693.XA
Other languages
Chinese (zh)
Other versions
CN113553541A (en
Inventor
刘维
江滔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangzhou University
Original Assignee
Yangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangzhou University filed Critical Yangzhou University
Priority to CN202110623693.XA priority Critical patent/CN113553541B/en
Publication of CN113553541A publication Critical patent/CN113553541A/en
Application granted granted Critical
Publication of CN113553541B publication Critical patent/CN113553541B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • G06Q10/047Optimisation of routes or paths, e.g. travelling salesman problem
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Pure & Applied Mathematics (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • Mathematical Optimization (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Primary Health Care (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Health & Medical Sciences (AREA)
  • Algebra (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

According to the source positioning method based on independent path analysis under the independent cascade model, the propagation condition of influence of different sources at different moments is researched, on the premise that infected nodes with the same time difference have a larger probability of being infected by the same source, analysis and judgment are carried out according to different independent paths, possible sources are positioned according to the observer set on the corresponding independent path with the largest occurrence at the same time, positioning accuracy is improved, compared with a traditional source positioning method, most of the source positioning method is the same moment when the source is infected by default, the possibility that different source nodes start to send influence at different moments is ignored, after the independent paths are generated, the exploration process of each source node is independently analyzed based on the independent paths, efficiency in the aspect of identifying the source nodes with influence propagation in a social network can be improved, and the application range and practicability of the technology in the field of source positioning problems are expanded.

Description

Independent path analysis-based source positioning method under independent cascade model
Technical Field
One or more embodiments of the present disclosure relate to the field of source positioning technologies, and in particular, to a source positioning method based on independent path analysis under an independent cascade model.
Background
In our real world, many complex systems can be described as complex networks, such as protein interaction networks, social networks, and the like. In social networks, there is no spreading of information from time to time, as in modern society, the process of spreading dynamic information is also ubiquitous, e.g., virus intrusion, spread of air or water pollution, etc. in computer networks. Therefore, how to accurately and quickly identify the propagation sources is an important task in network science research. Currently, a large number of students have studied the problem of locating sources of propagation in social networks.
The development of the internet created a "screen social age" where various social networking applications had gushed and people spent more time and effort than before walking through the vast relational networks that were compiled by the internet. The social network (Social Networking Service, SNS) is an application architecture under the web2.0 system. And the friend relationship is established in a mode that users pay attention to each other, and social circles are gradually enlarged through ways of blogs, microblogs, sharing and the like, so that a complex and huge social relationship network is finally formed. The biggest social network site "facial makeup" registered people in the world breaks through billions, and the daily active number of users reaches 7.5 billions. The number of the user of the microblog website nose-related Twitter breaks through 5 hundred million, and the content shared every day reaches billions. Social networking sites such as domestic personal networks, newfashioned microblogs and the like are also developing rapidly. While social networks provide unprecedented information resources to the public, the problem of network information content security is also becoming increasingly prominent. For example, in 2013, 4 th month, the united states share market is subject to short but intense resale because the merger is challenged by hackers to release false news, which is called the united states white house is terroristically challenged, and once the news is released, the news spreads rapidly over the whole twitter, causing the market to be extremely panicked, resulting in a very large economic loss. Therefore, how to quickly find the transmission source after the burst of some improper public opinion is important to control rumor transmission and maintain the stability and safety of social public order.
Common source location problems include single source and multi-source location problems, for which researchers have proposed using the Jordan Center Estimation (JCE) algorithm to solve the single source estimation class problem of the jordan infection center tree network, and studied an unbiased intermediate algorithm that calculates unbiased mediated centrality of all nodes in the infection map and a single source state and known state-based source location algorithm of monte carlo. For multi-source positioning, prakash et al recommends a Net-Sleuth algorithm based on the minimum description length in the source identification problem of multi-source and SI propagation models, and Wang et al also proposes a source identification method based on tag propagation, which is the first attempt to solve the problem of unknown propagation models in reality. In network type, shah et al propose a first algorithm to solve the problem of SI propagation model source identification on rule tree static networks, fioriti et al propose a propagation source identification algorithm called DA by calculating the dynamic age of each node in the infection map, lokhov et al set up a dynamic message passing algorithm for single source identification.
However, the existing source positioning algorithm has corresponding requirements on the network structure and the propagation model, but the algorithm does not need to consider the influence of the network structure and the propagation model on experiments, so the algorithm has universality. Furthermore, the accuracy of the estimation of the current source localization algorithm for all source node sets is still to be further improved.
Disclosure of Invention
In view of this, an object of one or more embodiments of the present disclosure is to provide a source positioning method based on independent path analysis under an independent cascade model, so as to solve the problem of low accuracy of the existing source positioning algorithm.
In view of the above object, one or more embodiments of the present disclosure provide a source positioning method based on independent path analysis under an independent cascade model, including:
determining a set of source nodes for initial propagation in a complex network;
performing independent cascade model diffusion according to a source node set, infecting nodes in the whole network until the infected nodes are no longer generated, and extracting the infected nodes to form an infected network;
randomly selecting a plurality of nodes from an infection network as observation nodes, starting from the observation nodes, performing back diffusion to form an independent path set, and recording the infection time difference of each observation node;
the observation nodes of the same infection time difference are recorded as a coverage group, each coverage group has a corresponding observation node, namely D ij ={O ij1 ,O ij2 ,...,O ijk }, wherein D ij Represents the j-th coverage group, O ijk Representing an overlay group D ij Each overlay group D ij The corresponding infection time difference is the peak point V i Propagation time of vertex V i I.e. observation node O i An infected node that arrives, for each observation node O i Then dividing it into a set G of observation points i Sequentially selecting observation point sets with minimum absolute values of lengths, and corresponding V i As a source node, deleting the observation nodes related to the rest observation point sets until all the observation nodes are selected and used up;
each will beCombining path nodes corresponding to the observation nodes obtained in the next time into an observation point set G i A source seed set is obtained.
Preferably, determining the set of source nodes for initial propagation in the complex network comprises:
and randomly selecting a plurality of nodes with higher node degree in the complex network as a source node set.
Preferably, back diffusing forms a set of independent paths comprising:
for an infected network, a propagation probability matrix is defined for each side, the probability of each side is assumed to be P, when a source passes through the side, the node can be communicated to the next node only when the probability is more than or equal to P, and meanwhile, each side is randomly given with a certain value as the length of a path, and an independent path is generated for each node in the network.
Preferably, selecting the observation point set with the minimum length absolute value, and selecting the corresponding V i As a source node, deleting the observation node related to the rest of the observation point set includes:
for each observation node O i Set up G (O) i )={D gh |O i ∈D gh }, wherein G (O) i ) The representation finds out the containing observation point O from all coverage groups i Cover group D of (2) i Each G (O) i ) At least one D gh Corresponding infected node V g Is selected as a seed source from |G (O i ) The minimum value is selected, and the corresponding V is selected i As the source node, from the remaining G (O i ) The observation node that has been selected is deleted.
Preferably, forming independent paths applies the dijkstra algorithm shortest path algorithm.
From the above, it can be seen that the source positioning method based on independent path analysis under the independent cascade model provided by one or more embodiments of the present disclosure provides a new source positioning concept. In the traditional source positioning method, most of the starting points of default source infection are the same moment, the possibility that different source nodes start to send out influences at different moments is ignored, and after independent paths are generated, the exploration process of each source node is independently analyzed based on the independent paths.
Drawings
For a clearer description of one or more embodiments of the present description or of the solutions of the prior art, the drawings that are necessary for the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description below are only one or more embodiments of the present description, from which other drawings can be obtained, without inventive effort, for a person skilled in the art.
FIG. 1 is a flow diagram of a source localization method according to one or more embodiments of the present disclosure;
fig. 2 is a schematic diagram of comparison between the duty cycle and AUROC value of an observer for one or more embodiments of the present disclosure and other prior methods at different numbers of sources, wherein the number of sources is 1, the number of nodes n=200, and the average degree degree=5;
fig. 3 is a schematic diagram showing a comparison between the ratio of observers and AUROC values for different numbers of sources, wherein the number of sources is 5, the number of nodes n=200, and the average degree of degradation=5, according to one or more embodiments of the present disclosure;
fig. 4 is a schematic diagram showing a comparison between the ratio of observers and AUROC values for different numbers of sources, wherein the number of sources is 10, the number of nodes n=200, and the average degree of degradation=5, according to one or more embodiments of the present disclosure;
fig. 5 is a graph of the ratio of the observer and AUROC values of the present invention versus other methods at different averages for one or more embodiments of the present specification, wherein the number of sources is 10, the number of nodes n=200, and the average degree degree=2;
fig. 6 is a graph of the ratio of the observer and AUROC values of the present invention versus other methods at different averages for one or more embodiments of the present specification, wherein the number of sources is 10, the number of nodes n=200, and the average degree degree=10;
fig. 7 is a graph of the ratio of the observer and AUROC values of the present invention versus other methods at different averages for one or more embodiments of the present specification, wherein the number of sources is 10, the number of nodes n=200, and the average degree degree=18;
fig. 8 is a graph of the ratio of the observers of the present invention to other methods and F1 score under three virtual networks according to one or more embodiments of the present specification, wherein the number of sources is 10, the number of nodes n=400, and the average degree n=10;
fig. 9 is a graph of the ratio of the present invention to the observer of other methods and the F1 score under three real networks according to one or more embodiments of the present specification, wherein the number of sources is 10, the number of nodes n=400, and the average degree n=10;
fig. 10 is a graph of the ratio of observers of the present invention to other methods and pre under three virtual networks for comparison, wherein the number of sources is 10, the number of nodes n=400, and the average degree of degradation=10, according to one or more embodiments of the present disclosure;
fig. 11 is a schematic heat diagram of the IMLM algorithm of one or more embodiments of the present disclosure under the node numbers n=100, 200, 300, 400, 500, respectively;
fig. 12 is a schematic heat diagram of the MSL-TK algorithm of one or more embodiments of the present disclosure under the node numbers n=100, 200, 300, 400, 500, respectively;
fig. 13 is a graph of the ratio of the observer and AUROC value for different numbers of sources for one or more embodiments of the present disclosure, wherein the number of sources is 1, the number of nodes n=200, and the average degree of degradation=10;
fig. 14 is a graph of the ratio of the observer and AUROC value for different numbers of sources for one or more embodiments of the present disclosure, wherein the number of sources is 2, the number of nodes n=200, and the average degree degree=10;
fig. 15 is a graph of the ratio of the observer and AUROC value for different numbers of sources for one or more embodiments of the present disclosure, wherein the number of sources is 3, the number of nodes n=200, and the average degree degree=10;
fig. 16 is a relationship between a prediction accuracy chart of each algorithm and an observer scale in a real network according to one or more embodiments of the present disclosure and other methods, wherein the number of sources is 1, the number of nodes n=400, and the average degree of degradation=10;
fig. 17 is a relationship between a prediction accuracy chart of each algorithm and an observer scale in a real network according to the present invention and other methods according to one or more embodiments of the present disclosure, where the number of sources is 2, the number of nodes n=400, and the average degree of degradation=10;
fig. 18 is a relationship between the prediction accuracy map and the observer scale of each algorithm in the real network of the present invention and other methods according to one or more embodiments of the present invention, where the number of sources is 3, the node number n=400, and the average degree degree=10.
Detailed Description
For the purposes of promoting an understanding of the principles and advantages of the disclosure, reference will now be made in detail to the following specific examples.
It is noted that unless otherwise defined, technical or scientific terms used in one or more embodiments of the present disclosure should be taken in a general sense as understood by one of ordinary skill in the art to which the present disclosure pertains. The use of the terms "first," "second," and the like in one or more embodiments of the present description does not denote any order, quantity, or importance, but rather the terms "first," "second," and the like are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.
The embodiment of the specification provides a source positioning method based on independent path analysis under an independent cascade model, which comprises the following steps:
s101, determining an initial propagation source node set in a complex network;
for example, before determining the set of source nodes, a complex network and seed nodes are input.
S102, performing independent cascading model (IC, independent Cascade Model) diffusion according to a source node set, infecting nodes in the whole network until the infected nodes are no longer generated, and extracting the infected nodes to form an infected network, so that the algorithm can be conveniently and finally verified.
S103, randomly selecting a plurality of nodes from an infected network as observation nodes (Observed), starting from the observation nodes, back-diffusing to form an independent path set, and recording an infection time difference delta t of each observation node, for example, for the infected network, defining a propagation probability matrix for each side, wherein the probability of each side is assumed to be P, when the probability of a source passing through the side is more than or equal to P, the source can be communicated to the next node, each side is randomly given with a certain value, and as the path length, an independent path is generated for each node in the network, namely, a shortest path generated by the method, namely, a shortest path generated by applying a Dijiestra algorithm is used, and the probability between a seed node u and a vertex v is also required to be considered while the shortest distance is preferentially considered, so that the independent path is generated by the shortest path algorithm only on the basis that the probability exceeds a threshold value.
S104 we already know the number of source nodes and the specific nodes,the sources issue influence at different time points, and the same Δt is found out first and sorted, and after sorting, the same Δt may be sorted together. The observation nodes with the same infection time difference are marked as one coverage group, the same observation nodes are satisfied, the node set with the same infection time difference is marked as C (i), and a plurality of coverage groups possibly contained in C (i) are marked as D i1 ,D i2 ,...,D ig Each overlay group has a corresponding observation node, D ij ={O ij1 ,O ij2 ,...,O ijk }, wherein D ij Represents the j-th coverage group, O ijk Representing an overlay group D ij Each overlay group D ij The corresponding infection time difference is the peak point V i Propagation time of vertex V i I.e. observation node O i An infected node that arrives, for each observation node O i Then dividing it into a set G of observation points i Sequentially selecting observation point sets with minimum absolute values of lengths, and corresponding V i As a source node, deleting the observation nodes related to the rest observation point sets until all the observation nodes are selected and used up;
for example, sequentially selecting the observation point set with the minimum absolute value of length, and then selecting the corresponding V i As a source node, deleting the observation node related to the rest of the observation point set includes:
for each observation node O i Set up G (O) i )={D gh |O i ∈D gh }, wherein G (O) i ) The representation finds out the containing observation point O from all coverage groups i Cover group D of (2) i Each G (O) i ) At least one D gh Corresponding infected node V g Is selected as a seed source from |G (O i ) The minimum value is selected, and the corresponding V is selected i As the source node, from the remaining G (O i ) The observation node that has been selected is deleted.
The specific description is as follows:
1) Initial time t 0 Assume that the source is known;
2) Known infected nodes will depend onThe propagation delay selects its neighbor node according to the shortest path algorithm, and then infects the target node according to a certain probability. The time when the target node i receives the information for the first time is recorded as t i All neighbors (i) to which the information is then passed on so that the time each node receives the information can be expressed as t ii+j Wherein θ is i+j Representing the time delay on each side. Thus, the propagation of each point can be noted as t i =t 0 +min{ΔΔ s,1 ,ΔΔ s,2 ,...,ΔΔ s,j },Δt s,j Representing propagation delay from the source to the path node.
3) When all nodes have been notified, the entire propagation process ends.
Source location process: we defineFor k observation node sets, t 0 Then only the observing node knows for the initial set of times. The network topology and time delay are combined and back propagation is performed from each observation node.
Description of the steps:
in g= (V, E), G is a network graph, V represents a node, E represents an edge, and the affected node set that has been unserved is set to o= { O 1 ,o 2 ,…,o n Each O i The time of being affected isAssuming that the entire propagation process is performed at a discrete point in time t=0, 1, …, n, edge (V i ,V j ) The time delay is T i,j The aforementioned edge (V i ,V j ) I.e., the edge of node i to node j.
Assuming that all sources are contributing at the same time t=0, we only need to do so at eachIs less than O in the time difference of infection k Elements 0 > indicate that i may be O k If no element with the infection time difference of 0 is found, the source of the point with the highest frequency of the same infection time difference is O k Is a source of influence. Let other nodes than the unserved be V 1 ,V 2 ,…,V n-m Δt in their corresponding C (i) (i=1, 2, … n-m) k The vertex set of the Observed is D (i) and the purpose of the vertex set is to find a plurality of sets of D (i) so as to cover O, and the vertex set is made by a greedy algorithm.
Assuming we know the number of source nodes and the specific nodes, the sources send out influence at different time points, first find out the same Δt and sort, after sorting, the same Δt may be sorted together. Let us note that the same value of Observed is a coverage group, and that in C (i) multiple coverage groups may be included, denoted as D i1 ,D i2 ,...,D ig Each coverage group D ij ={O ij1 ,O ij2 ,O ij3 ,...,O ijk The corresponding time is deltat ij Namely V i Time of start of propagation. To ensure all O i Can be covered, we for each O i Set up G (O) i )={D gh |O i ∈D gh Each G (O) i ) At least one D gh Corresponding infected node V g Selected as seeds (sources), where D gh Representing the h-th coverage group in node set C (G), we are from |G (O i ) The minimum value is selected, and the corresponding V is selected i As the source node, from the remaining G (O i ) And deleting the selected observation point set, and the like until all the observation points are selected.
The accuracy of the method is verified in the following by a specific example, as shown in fig. 2-4, which are graphs of the ratio of observers versus the AUROC value for different numbers of sources. We have performed a correlation experiment on three virtual networks ER, WS, BA, respectively, the abscissa representing the proportion of the selected observer to the number of nodes, the ordinate being the AUC value, the three graphs representing the AUC values for three cases of 1,5, 10 sources, respectively, the number of nodes n=200. Overall, the effect on WS and BA is better than ER networks because ER's degree distribution is more random, and secondly, AUC values show a significant trend of decreasing as the number of sources increases, and when the number of sources is 1, the effect is optimal because of the degree-centering principle, so on WS and BA, we can find that the effect of the degree-centering algorithm is less different than the other two IMLM and MSL-LP algorithms, and that the MMM algorithm is poor because it depends entirely on the distance between the initial time and the path, and it is difficult to achieve the auc=1 condition. When the number of sources is increased, the experimental effect of the centrality algorithm is gradually reduced, because of the effect caused by the increase of the node number and the node degree, and on the contrary, the advantages of the algorithm provided by the inventor are gradually presented, the overall change of the MSP-LP algorithm is not large, and the stability is good.
Fig. 5-7 are graphs of the relationship between the observer's duty cycle and the AUROC value at different averages, showing the AUC value and the observer scale, with the three graphs representing three cases of average 2, 10 and 18, respectively. As can be seen from FIG. 5, the experimental effect of ER is the worst, and only MSL-LP algorithm can reach the condition that AUC is 1 under the condition that observers are enough under the WS network, while BA network has better effect because of following power distribution. From the overall trend, we find that as the average degree increases, the AUC value is more and more close to 1, especially the fluctuation of the centrality algorithm is larger, the change of the MMM algorithm is related to the setting of the initial time of the experiment and the selection of the observation point, and the other two algorithms are more stable.
Fig. 8-10 are graphs of the direct relationship between the observer's duty cycle and the F1 score, we choose the case where the average k=10 and the number of sources is 5. As is evident from the figure, the improved IMLM algorithm and MSL-LP algorithm have more accurate hit rates, and have higher probability of predicting the source node in the process of multiple experiments. Because the MLM algorithm lacks a condition, the overall trend and IMLM change are not large, and only when the time difference from the infection time of the observation point to the path node in the result is 0 becomes large, there is higher fluctuation, the accuracy is lower because the algorithm with central degree is more biased to a probability problem, and the MMM algorithm is difficult to maintain because the algorithm is completely limited by the distance between the initial time and the path point pair. Thus, it is shown that our algorithm has a higher probability of hit. Fig. 10 is a graph of predicted real rate (pre) in a virtual network and a real network, and similarly, since the MMM algorithm prediction is greatly affected by the distance between the initial time and the path, and there is a high requirement for parameter setting, it is difficult to predict accurately when observers are few. In contrast, MSL-LP algorithms reach a more accurate prediction level quickly when the observer reaches a certain scale. The improved IMLM algorithm is slightly worse, the MLM algorithm has low probability relative to the IMLM algorithm due to the lack of one condition, but the overall trend of the two algorithms is not greatly different. The isocenter algorithm will fluctuate to some extent when it is computationally accurate, since as observers increase, more points of the same maximum node degree can be obtained.
Fig. 11-12 are thermal graphs between the observer's duty cycle and the number of nodes. We also heat-map the result predictions for five cases of node numbers 100, 200, 300, 400, 500 on three networks ER, WS, BA, the higher the probability that the surface will find the source when the color approaches 1. Fig. 11 is the result of the modified IMLM algorithm, and fig. 12 is the result of the MSL-LP algorithm. Experimental results show that as the number of nodes increases, the larger the ratio of observers is, the better the algorithm effect is.
Fig. 13-15 are graphs of the relationship between observer's duty cycle and AUROC value for different numbers of sources, we have conducted correlation experiments on three real networks USAIR, AS and facebok. It can be seen from the figure that when the dataset is large, the observer ratio is small, the effect of the five algorithms is low, because the number of nodes with the same maximum node degree is large, so that it is difficult to locate the real source, and the dataset is unfavorable for the centrality algorithm. When the observer proportion reaches a certain degree, namely about 0.4, the AUC value starts to reach a peak, the IMLM algorithm and the MSL-SP algorithm have no great difference when the observer proportion is lower, even the IMLM algorithm can obtain more ideal AUC value, and the algorithm superiority of the MSL-LP is gradually reflected as the observer grows more. MMM algorithms are still the least effective.
Fig. 16-18 are relationships between the prediction accuracy map and observer scale for each algorithm in a real network. It is obvious from the graph that since the prediction of the MMM algorithm is greatly affected by the initial time and the distance between paths and has high requirements on parameter setting, the algorithm is difficult to predict accurately when observers are few, the centrality algorithm sometimes exceeds the MLM and IMLM, the centrality algorithm is related to the node degree, the effect is more remarkable when the source node degree is highest, and the MSL-LP algorithm has the best and stable effect because the design consideration of the algorithm is comprehensive. Because the facebook data set is larger and the span of the whole node is wider, when observers are few, the prediction accuracy of the MSL-LP algorithm is slightly reduced, however, with the increase of the number of the observers, the superiority of the algorithm is gradually reflected, and the prediction accuracy is greatly increased.
Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the disclosure, including the claims, is limited to these examples; combinations of features of the above embodiments or in different embodiments are also possible within the spirit of the present disclosure, steps may be implemented in any order, and there are many other variations of the different aspects of one or more embodiments described above which are not provided in detail for the sake of brevity.
Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure one or more embodiments of the present description. Furthermore, the apparatus may be shown in block diagram form in order to avoid obscuring the one or more embodiments of the present description, and also in view of the fact that specifics with respect to implementation of such block diagram apparatus are highly dependent upon the platform within which the one or more embodiments of the present description are to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that one or more embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.
The present disclosure is intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Any omissions, modifications, equivalents, improvements, and the like, which are within the spirit and principles of the one or more embodiments of the disclosure, are therefore intended to be included within the scope of the disclosure.

Claims (5)

1. A source positioning method based on independent path analysis under an independent cascade model, the method comprising:
determining an initial propagated source node set in a complex network, wherein the complex network is a social network;
performing independent cascade model diffusion according to a source node set, infecting nodes in the whole network until the infected nodes are no longer generated, and extracting the infected nodes to form an infected network;
randomly selecting a plurality of nodes from an infection network as observation nodes, starting from the observation nodes, performing back diffusion to form an independent path set, and recording the infection time difference of each observation node;
the observation nodes of the same infection time difference are recorded as a coverage group, each coverage group has a corresponding observation node, namelyWherein->Represents the j-th cover group, +.>Representing the cover group->The kth observation node of (a) each coverage group->The corresponding infection time difference is the peak +.>Propagation time of (a) vertex>I.e. observation node->An infected node reached +.>Dividing the observation point into a set of observation points +.>Sequentially selecting observation point sets with minimum absolute values of lengths, and correspondingly +.>As a source node, deleting the observation nodes related to the rest observation point sets until all the observation nodes are selected and used up;
combining path nodes corresponding to the observation nodes obtained each time into an observation point setA source seed set is obtained.
2. The method for source localization based on independent path analysis under independent cascading models of claim 1, wherein the determining the initial propagated set of source nodes in the complex network comprises:
and randomly selecting a plurality of nodes with higher node degree in the complex network as a source node set.
3. The method of independent path analysis based source localization in independent cascading models of claim 1, wherein the back-diffusing forms an independent path set comprises:
for an infected network, a propagation probability matrix is defined for each edge, the probability of each edge being assumed to beWhen the source passes through this edge, only the satisfaction probability is equal to or greater than +.>And (3) communicating to the next node, and randomly assigning a certain value to each edge to serve as the length of the path to generate an independent path for each node in the network.
4. The source positioning method based on independent path analysis under independent cascade model as claimed in claim 1, wherein the observation point set with minimum absolute value of length is sequentially selected, and the corresponding observation point set is used for the source positioningAs a source node, deleting the observation node related to the rest of the observation point set includes:
for each observation nodeSet up->Wherein->Indicating that the observation point is found out from all coverage groups>Cover group->Each +.>At least one of->Corresponding infected node->Is selected as seed source from->The minimum value is selected and the corresponding +.>As source node, from the rest +.>The observation node that has been selected is deleted.
5. The independent path analysis based source localization method of claim 1, wherein the forming an independent path applies a dijkstra algorithm shortest path algorithm.
CN202110623693.XA 2021-06-04 2021-06-04 Independent path analysis-based source positioning method under independent cascade model Active CN113553541B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110623693.XA CN113553541B (en) 2021-06-04 2021-06-04 Independent path analysis-based source positioning method under independent cascade model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110623693.XA CN113553541B (en) 2021-06-04 2021-06-04 Independent path analysis-based source positioning method under independent cascade model

Publications (2)

Publication Number Publication Date
CN113553541A CN113553541A (en) 2021-10-26
CN113553541B true CN113553541B (en) 2023-10-13

Family

ID=78101969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110623693.XA Active CN113553541B (en) 2021-06-04 2021-06-04 Independent path analysis-based source positioning method under independent cascade model

Country Status (1)

Country Link
CN (1) CN113553541B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105915399A (en) * 2016-06-27 2016-08-31 华侨大学 Network risk source tracing method based on back propagation
CN107682200A (en) * 2017-10-26 2018-02-09 杭州师范大学 A kind of method of the transmission on Internet source positioning based on finite observation
CN111539476A (en) * 2020-04-24 2020-08-14 四川大学 Observation point deployment method for information source positioning based on naive Bayes
CN111985569A (en) * 2020-08-21 2020-11-24 哈尔滨工业大学(威海) Anonymous node positioning method based on multi-source point clustering idea

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9959365B2 (en) * 2015-01-16 2018-05-01 The Trustees Of The Stevens Institute Of Technology Method and apparatus to identify the source of information or misinformation in large-scale social media networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105915399A (en) * 2016-06-27 2016-08-31 华侨大学 Network risk source tracing method based on back propagation
CN107682200A (en) * 2017-10-26 2018-02-09 杭州师范大学 A kind of method of the transmission on Internet source positioning based on finite observation
CN111539476A (en) * 2020-04-24 2020-08-14 四川大学 Observation point deployment method for information source positioning based on naive Bayes
CN111985569A (en) * 2020-08-21 2020-11-24 哈尔滨工业大学(威海) Anonymous node positioning method based on multi-source point clustering idea

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于子图抽取的在线社交网络多传播源点定位方法;张锡哲;张聿博;吕天阳;付世海;张斌;;中国科学:信息科学(第04期);496-510 *

Also Published As

Publication number Publication date
CN113553541A (en) 2021-10-26

Similar Documents

Publication Publication Date Title
Shi et al. Human-centric cyber social computing model for hot-event detection and propagation
Effendy et al. Classification of intrusion detection system (IDS) based on computer network
US20150242497A1 (en) User interest recommending method and apparatus
Dong et al. The algorithm of link prediction on social network
CN101916256A (en) Community discovery method for synthesizing actor interests and network topology
Ban et al. Local clustering in contextual multi-armed bandits
KR101590976B1 (en) Method and Apparatus for Collaborative Filtering of Matrix Localization by Using Semantic Clusters Generated from Linked Data
Yu et al. Privacy preservation based on clustering perturbation algorithm for social network
Lim et al. A topological approach for detecting twitter communities with common interests
Khayyambashi et al. An approach for detecting profile cloning in online social networks
Agarwal et al. A social identity approach to identify familiar strangers in a social network
Huang et al. Information fusion oriented heterogeneous social network for friend recommendation via community detection
Tommasel et al. Do recommender systems make social media more susceptible to misinformation spreaders?
Tama et al. A comparative study of phishing websites classification based on classifier ensemble
Shi et al. Event detection and multi-source propagation for online social network management
Elyusufi et al. Social networks fake profiles detection based on account setting and activity
Bao et al. Privacy-preserving collaborative filtering algorithm based on local differential privacy
Tai et al. Structural diversity for resisting community identification in published social networks
Liu et al. Digger: Detect similar groups in heterogeneous social networks
Sun et al. Graph Based Long-Term And Short-Term Interest Model for Click-Through Rate Prediction
CN113553541B (en) Independent path analysis-based source positioning method under independent cascade model
Jiang et al. A user interest community evolution model based on subgraph matching for social networking in mobile edge computing environments
Qu et al. Tracing truth and rumor diffusions over mobile social networks: Who are the initiators?
Elmisery et al. Privacy aware obfuscation middleware for mobile jukebox recommender services
Dmitriev et al. Self‐Organized Criticality on Twitter: Phenomenological Theory and Empirical Investigation Based on Data Analysis Results

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant