CN110138619B - Initial node selection method and system for realizing influence maximization - Google Patents

Initial node selection method and system for realizing influence maximization Download PDF

Info

Publication number
CN110138619B
CN110138619B CN201910448351.1A CN201910448351A CN110138619B CN 110138619 B CN110138619 B CN 110138619B CN 201910448351 A CN201910448351 A CN 201910448351A CN 110138619 B CN110138619 B CN 110138619B
Authority
CN
China
Prior art keywords
node
initial
nodes
counter
sampling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910448351.1A
Other languages
Chinese (zh)
Other versions
CN110138619A (en
Inventor
周旭
刘勇刚
姜文君
肖国庆
罗文晟
李肯立
李克勤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN201910448351.1A priority Critical patent/CN110138619B/en
Publication of CN110138619A publication Critical patent/CN110138619A/en
Application granted granted Critical
Publication of CN110138619B publication Critical patent/CN110138619B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Operations Research (AREA)
  • Algebra (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Probability & Statistics with Applications (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an initial node selection method for realizing influence maximization, which aims at a scene that multiple influences are simultaneously transmitted in a social network, introduces a crowd consciousness into a transmission process, and provides a reverse sampling method, an initial node selection method and an initial node estimation method aiming at a transmission model of the crowd consciousness. The propagation process is modeled more scientifically and truly from the propagation model of the public consciousness, the initial node selection method can accurately and efficiently select the initial node, can adapt to a large-scale network structure, and improves the timeliness of the initial node selection method.

Description

Initial node selection method and system for realizing influence maximization
Technical Field
The invention belongs to the technical field of computer information, and particularly relates to an initial node selection method and system for realizing influence maximization.
Background
The development of the internet not only provides convenient life for human beings, but also changes the life and working modes of human beings. With the rise of network applications such as Facebook and microblog and the popularization of mobile network terminals, people who are scattered in different regions, have different beliefs and belong to different countries and organizations are connected together in an online social network to form a huge information traffic network. Information dissemination of a large-scale social network implies huge economic and social values (for example, the information dissemination can be used for advertisement marketing and policy promotion), so how to select an initial node to achieve Influence maximization (IM for short) has become an important research problem in the technical field of social networks.
The existing initial node selection method is mainly based on an Independent Cascade (IC) model or a Linear Threshold (LT) model, and determines the influence of an initial node by using a method based on simulation, a heuristic method or a method of reverse influence sampling. However, the models used by the existing initial node selection methods still have non-negligible technical problems: because the behavior of people is not considered, the established model is not real enough, so that the obtained influence value is not accurate enough, and finally the selected initial node is not ideal enough.
Disclosure of Invention
Aiming at the defects or improvement requirements in the prior art, the invention provides an initial node selection method and system for realizing influence maximization, aiming at solving the technical problems that the acquired influence value is not accurate enough and the selected initial node is not ideal enough due to the fact that the model adopted by the conventional initial node selection method does not consider the behaviors of the people; in addition, the propagation model provided by the invention enhances the authenticity and scientificity of the propagation model and also improves the application range of the propagation model; finally, the initial node selection method provided by the invention improves the timeliness of the initial node selection method.
To achieve the above object, according to one aspect of the present invention, there is provided an initial node selection method for maximizing influence, including the steps of:
(1) constructing an influence graph according to the established propagation model, and generating a first initial node set S according to the constructed influence graphCFor propagating other competitive influences;
(2) according to preset precision parameters epsilon and delta, utilizing SSA Algorithm acquisition threshold T0Performing T on the influence map obtained in (1)0Sampling is adopted in the second reverse direction, and T obtained by all sampling is used0Putting a sample into a sample set
Figure GDA0002377762310000021
Performing the following steps;
(3) according to the sample set returned in the step (2)
Figure GDA0002377762310000022
Iteratively selecting k nodes with the largest edge profit from the influence graph as initial node sets, and obtaining adopted biased estimates of a second initial node set S
Figure GDA0002377762310000023
Wherein 0<k<n;
(4) Setting a counter c10 equal to 0, which indicates the number of times of performing the inverse sampling of step (5), and setting the SUM of total adoption of the second initial node set S obtained in step (3)2=0;
(5) Sampling the influence graph reversely, and calculating the adoption A obtained in the sampling of the second initial node set S obtained in the step (3) in the current samplingc10And updating the total adopted SUM of the second initial node set S obtained in the step (3)2=SUM2+Ac10
(6) Determine whether counter c10 is less than threshold T2And SUM of total adoption of the second set of initial nodes S2Whether or not less than threshold value T3If yes, c10 is set to c10+1, and the step (5) is returned, otherwise, the output is output
Figure GDA0002377762310000024
And entering step (7);
(7) judging the biased estimation of the second initial node set S obtained in the step (3)
Figure GDA0002377762310000025
Unbiased estimation with the second set of initial nodes S obtained in step (6)
Figure GDA0002377762310000026
Whether or not to satisfy
Figure GDA0002377762310000027
If yes, directly outputting a second initial node set S as a result, ending the process, otherwise updating the sampling times T used in the step (2)0=2*T0And returning to the step (2).
Preferably, the influence graph is represented as G ═ V, (E, p), where V represents the set of nodes, E represents the set of directed edges, p represents a function of the pre-assigned activation probabilities of all edges, each edge having a pre-assigned activation probability p (u, V) E [0,1], representing the probability that node u activates node V.
Preferably, the propagation process of the propagation model is as follows:
first, at time 0, the different influences activate the respective initial nodes to start the propagation process. Different influences can select the same node as an initial node;
then, at time t, the node that has been activated at time t-1 becomes activated at this time and attempts to activate the neighbor nodes that are not activated, if the activation attempt is successful, the activated node will receive all the effects of the activated node, if the activation attempt is unsuccessful, the nodes that are not activated remain in the inactivated state, after which the node in the activated state will adopt one of its received effects according to the probability defined by the adoption function of the slave and finally becomes the adopted state;
the probability is defined by a function h (u, I) from numerous as follows:
Figure GDA0002377762310000031
wherein, h (u, I)i) Representing node u adoption impact IiIs the set of all influence components, NA (u, I)i) Indicating that node u is activated and to which impact I is propagatediOf the neighboring node.
Finally, the propagation process terminates when no new nodes are activated.
The sample in the step (2) is data obtained by sampling in a single reverse direction, and includes a node set R, a set { d (u) | u ∈ R } of a distance between each node w and other nodes in the node set R, a set { NA (u) | u ∈ R } of an activated node set of each node in the node set R, a node w, and a first initial node set SCDistance d ofCAnd node w is set by the first initial node SCTotal number of activation times Time of the intermediate nodesC
Preferably, the process of single inverse sampling in step (2) comprises the following sub-steps:
(2-1) randomly selecting a node w from the influence graph, and setting the distance d (w) between the node w and the node wc1) Put w into node set R and set initial node set S as 0CDistance d from node wCSetting a variable curDist to be 0, wherein curDist represents the distance between the currently processed node and the node w, and is called the current distance for short;
(2-2) judging whether the node w is the first initial node set SCIf yes, recording the total Time of activation of node w by competition influenceCAnd an initial set of nodes SCDistance d from node wcIf not, the step (2-17) is carried out, otherwise, the step (2-3) is carried out;
(2-3) setting a counter c1 to be 1, judging whether the current distance curDist is curDist +1, judging whether the value c1 is larger than the total number of the neighbor nodes of the node w, if so, turning to the step (2-17), otherwise, turning to the step (2-4);
(2-4) according to the c1 th entry neighbor node w of the node wc1Probability decision node w of activation node wc1Whether the node w is successfully activated or not is judged, and if yes, the node w is recordedc1Distance d (w) from node wc1) As curDist, node wc1Put into its active node set NA (w)c1) Node wc1Putting the nodes into a node set V, and turning to the step (2-5), otherwise, turning to the step (2-6);
(2-5) judging the node wc1Whether it is the first set of initial nodes SCIf it is an element ofThen the initial node is assembled SCDistance d from node wCSetting the current distance curDist, and turning to the step (2-6), otherwise, turning to the step (2-6);
(2-6) judging whether the value of c1 is equal to the total number of the neighbor nodes of the node w, if so, switching to the step (2-8), otherwise, switching to the step (2-7);
(2-7) setting the counter c1 ═ c1+1, and returning to step (2-4);
(2-8) judging the node set V and the first initial node set SCWhether an intersection exists or whether the node set V is an empty set, if so, calculating and recording the node w by the node set V and the node set SCTotal Time of activation affected by competition propagated by nodes in the intersectionCAnd go to step (2-17), otherwise put all nodes in the node set V into the node set R, and go to step (2-9);
(2-9) setting counter c2 ═ 1, and current distance curDist ═ curDist + 1;
(2-10) selecting the c2 th node w from the node set Vc2And setting the counter c3 to 1, and judging whether the value of the counter c3 is larger than the node wc2If yes, then step (2-14) is carried out, otherwise step (2-11) is carried out;
(2-11) judging the node wc3Whether it is not in the node set R and according to the node wc2C3 th neighbor node wc3Activation node wc2Is determined by the probability of node wc3Activation node wc2If successful, set node wc3Distance d (w) from node wc3) Equal to curDist, node wc2Is NA (w)c2) Element of (5) is put into node wc3Activation node set NA (w)c3) In (1), node wc3Put into node set VnextAnd (5) turning to the step (2-12), otherwise, turning to the step (2-13);
(2-12) judging the node wc3Whether it is the first set of initial nodes SCIf yes, setting an initial node set SCDistance d from node wCEqual to curDist and go to step (2-13), otherwiseDirectly transferring to the step (2-13);
(2-13) judging whether the value of the counter c3 is larger than the node wc2If yes, the step (2-14) is carried out, otherwise, a counter c3 is set to be c3+1, and the step (2-11) is returned;
(2-14) judging whether the value of the counter c2 is equal to the total number of the nodes in the node set V, if so, entering the step (2-15), otherwise, setting the counter c2 to c2+1, and returning to the step (2-10);
(2-15) judging the node set VnextWith a first set of initial nodes SCWhether there is an intersection, or a set of nodes VnextWhether the node is an empty set or not, if so, acquiring a node w and collecting a node set VnextAnd SCTotal Time of activation affected by competition propagated by nodes in the intersectionCAnd then the step (2-17) is carried out, otherwise, the step (2-16) is carried out;
(2-16) node set VnextReplacing all nodes in the node set V by all nodes in the node set V, putting all nodes in the node set V into the node set R, and returning to the step (2-8);
(2-17) acquiring a node set R, a set { d (u) | u ∈ R } of the distance between each node in the node set R and the node w, a set { NA (u) | u ∈ R } of an activated node set of each node in the node set R, the node w and a first initial node set SCDistance d ofCAnd node w is set by the first initial node set SCTotal number of activation times Time of the intermediate nodesC
Preferably, the step (3) of iteratively selecting k nodes with the largest edge profit as the initial node set includes the following substeps:
(3-1) setting
Figure GDA0002377762310000061
Setting the counter c4 to 1, and setting each node in the influence graph in the sample set
Figure GDA0002377762310000062
All the initial values adopted in (1) are 0, and the second initial node set S is in the sampleCollection
Figure GDA0002377762310000063
SUM of adopted SUM of above1=0;
(3-2) setting counter c5 ═ 1;
(3-3) from the sample set
Figure GDA0002377762310000064
Taking out the c5 th sample Rc5Setting counter c6 to 1;
(3-4) from the sample Rc5Take out the c6 th node wc6Is judged as Rc5Middle node wc6Distance d (w) from node wc6) Whether or not less than dCIf yes, set node wc6At Rc5The above is adopted as 1, node wc6 Adding 1 to the total adoption of the method, and then turning to the step (3-5); otherwise, set node wc6At Rc5The adoption of is
Figure GDA0002377762310000065
Updating wc6In a sample set
Figure GDA0002377762310000066
Is the node wc6In a sample set
Figure GDA0002377762310000067
Total adoption of (1) plus
Figure GDA0002377762310000068
Then, the step (3-5) is carried out; wherein NA (w)c6) And TimeCAre respectively representative of samples Rc5Active node set and first initial node set S in (1)CImpact activation samples R propagated by middle nodesc5The total number of nodes w in (d).
(3-5) judging whether the counter c6 is smaller than | Rc5If yes, setting c6 to c6+1, then returning to the step (3-4), otherwise, turning to the step (3-6);
(3-6) judging whether or not the counter c5 is smaller than
Figure GDA0002377762310000069
If yes, c5 is set to c5+1, the step (3-3) is returned, and otherwise, the step (3-7) is carried out;
(3-7) utilizing the large root heap to influence all nodes in the graph according to each node
Figure GDA00023777623100000610
The total adopted sizes of the above are arranged in descending order;
(3-8) judgment counter C4Whether the k is less than a preset threshold value k, if so, taking out the k
Figure GDA00023777623100000611
Node w of which the total number of nodes is the maximumc4Node wc4Adding a second initial node set S, setting SUM1=SUM1+A(wc4) Wherein A (w)c4) Is node wc4In that
Figure GDA00023777623100000612
And remove node w from the big root heapc4And going to step (3-9), otherwise obtaining the initial node set S and the biased estimation adopted by the initial node set S
Figure GDA00023777623100000613
Then, the process goes to the step (3-22).
(3-9) from the sample set
Figure GDA00023777623100000614
Take out all contained nodes wc4Forming a sample set
Figure GDA00023777623100000615
And sets counter c7 to 1;
(3-10) taking out
Figure GDA0002377762310000071
C7 th sample R in (1)c7And judging the sample Rc7Whether or not toIs marked as determined adopted, if yes, the step (3-21) is carried out, otherwise, the step (3-11) is carried out;
(3-11) determination of wc4And sample Rc7Distance d (w) of node wc4) Whether or not less than Rc7Initial node set S in (1)CDistance d from node wCIf so, the sample R is markedc7If the determination is adopted, the step (3-12) is carried out, otherwise, the step (3-15) is carried out;
(3-12) setting counter c8 to 1;
(3-13) taking out sample Rc7C8 th node in (1), update wc8In a sample set
Figure GDA0002377762310000072
Is the node wc8In a sample set
Figure GDA0002377762310000073
Total sum of above minus node wc8At Rc7And update the node wc8Sorting in a large heap.
(3-14) judging whether or not the counter c8 is smaller than | Rc7If yes, setting c8 to c8+1, and going to the step (3-13), otherwise, going to the step (3-21);
(3-15) setting counter c9 to 1;
(3-16) taking out sample Rc7C9 th node w inc9Judgment of wc9And sample Rc7Distance d (w) of node wc9) Whether or not less than Rc7Initial node set S in (1)CDistance d from node wCIf yes, going to the step (3-17), otherwise, going to the step (3-18);
(3-17) update node wc9At Rc7Is adopted as a node wc9At Rc7Node w is subtracted from the abovec4At Rc7By adopting the above method, the node w is updatedc9In a sample set
Figure GDA0002377762310000074
Is the node wc9In a sample set
Figure GDA0002377762310000075
Total sum of above minus node wc4At Rc7The above steps are adopted;
(3-18) updating node wc9In a sample set
Figure GDA0002377762310000076
Is the node wc9In a sample set
Figure GDA0002377762310000077
Total sum of above minus node wc9At Rc7By adopting the above method, the node w is updatedc9At Rc7The adoption of is
Figure GDA0002377762310000078
Wherein
Figure GDA0002377762310000079
The size of the union of the active node sets for the nodes in set S, i.e., the second set of initial nodes S at Rc7Total number of activations on and updates node w againc9In a sample set
Figure GDA00023777623100000710
Is the node wc9In a sample set
Figure GDA00023777623100000711
Total adoption of (1) plus node wc9At Rc7Adopting after updating;
(3-19) updating the ordering of node wc9 in the big root heap;
(3-20) judging whether the counter c9 is less than | Rc7If yes, setting c9 to c9+1, and going to the step (3-16), otherwise, going to the step (3-21);
(3-21) judging whether or not the counter c7 is smaller than
Figure GDA0002377762310000081
If yes, setting c7 to c7+1, and going to the step (3-10), otherwise, going to the step (3-8);
(3-22) judgment of SUM1Whether or not it is greater than threshold value T1If yes, entering the step (4), otherwise updating the sampling times T used in the step (2)0=2*T0And returns to (2). According to the SSA algorithm, the threshold T1Is calculated by the formula
Figure GDA0002377762310000082
Wherein
Figure GDA0002377762310000083
ε2=ε3=(1-1/e)-1ε/2。
Preferably, the adoption f (S) of the second initial node set S is obtained by adopting the following formula:
Figure GDA0002377762310000084
wherein S is an initial node set, n is the total number of nodes in the network, AiIs the revenue of the sampling at the ith time,
Figure GDA0002377762310000085
is the number of actual samples and is,
Figure GDA0002377762310000086
is that
Figure GDA0002377762310000087
Average of all samples taken.
Preferably, the influence graph is subjected to inverse sampling in step (5), and the sampling a obtained in this sampling by the second initial node set S obtained in step (3) is calculatedc10Comprises the following substeps:
(5-1) randomly selecting a node z from the influence graph, placing the node z in a node set R, a first initial node set SCDistance d from node zCInfinity, of a second set S of initial nodes and node zDistance dSInfinity, the current distance curDist is 0, and the sampling of the second initial node set S at this time is ac10=0;
(5-2) determining whether the node z is the first initial node set SCIf d is an element in (1)CIf so, switching to the step (5-3), otherwise, switching to the step (5-3);
(5-3) judging whether the node z is an element in the second initial node set S, if so, dSIf so, switching to the step (5-4), otherwise, switching to the step (5-4);
(5-4) judgment of dCWhether the current value is equal to curDist or not, if so, turning to the step (5-5), otherwise, entering the step (5-6);
(5-5) judgment of dSWhether the current sampling rate is equal to curDist or not, if so, the sampling of the second initial node set S at this time is adopted
Figure GDA0002377762310000091
Wherein TimeCThe total number of times that the node z is activated by the influence of competition is calculated, and the step (5-30) is carried out, otherwise, the sampling A of the second initial node set S at the time is adoptedc10Entering step (5-30) when the value is 0;
(5-6) judging whether the current sampling rate is equal to curDist, if so, adopting A for the current sampling of the second initial node set Sc10If the value is 1, the step (5-30) is carried out, otherwise, the step (5-7) is carried out;
(5-7) setting counter c11 ═ 1, current distance curDist ═ curDist + 1;
(5-8) judging whether the value of c11 is larger than the total number of the neighbor nodes of the node z, if so, turning to the step (5-13), otherwise, turning to the step (5-9);
(5-9) the c11 th entry neighbor node z according to the node zc11Probabilistic decision node z of activation node zc11If the activation of the node z is successful, if so, the node z is activatedc11Put into its active node set NA (z)c11) Node z is to bec11Putting the nodes into a node set V, and turning to the step (5-10), otherwise, turning to the step (5-11);
(5-10) judging the node zc11Whether it is the first set of initial nodes SCElement (1) ofIf yes, dCAs currDist, node zc11Joining node SetCAnd (5) turning to the step (5-11), otherwise, turning to the step (5-11);
(5-11) judging the node zc11Whether it is an element in the second set of initial nodes S, and if so, dSAs currDist, node zc11Joining node SetSAnd (5) turning to the step (5-12), otherwise, turning to the step (5-12);
(5-12) entering a step (5-8) when the counter c11 is c11+ 1;
(5-13) judgment of dCWhether the current value is equal to curDist or not, if yes, the step (5-14) is carried out, and otherwise, the step (5-15) is carried out;
(5-14) judgment of dSWhether the current sampling rate is equal to curDist or not, if so, the sampling of the second initial node set S at this time is adopted
Figure GDA0002377762310000101
Wherein
Figure GDA0002377762310000102
Is the total number of times node z is activated by the competing influence, InfsetC(u) is the Set of nodesCAnd (5) collecting competition influence propagated by the middle node, and turning to the step (5-30), otherwise, adopting A for the sampling of the second initial node set S at the timec10Entering step (5-30) when the value is 0;
(5-15) judgment of dSWhether the current sampling rate is equal to curDist or not, if so, the sampling A of the second initial node set S at this time is adoptedc10If the value is 1, the step (5-30) is carried out, otherwise, the step (5-16) is carried out;
(5-16) if the node set V is not an empty set, and dSIs not equal to curDist, and dCIf the sum is not equal to curDist, putting all the nodes in the node set V into the node set R, and turning to the step (5-17), otherwise, turning to the step (5-27);
(5-17) setting counter c12 ═ 1, current distance curDist from node z ═ curDist + 1;
(5-18) judging whether the value of c12 is larger than the total number of nodes in the node set V, if so, entering the step (5-26), otherwise, entering the step (5-19);
(5-19) selecting a c12 th node z from the node set Vc12And sets counter c13 to 1;
(5-20) judging whether the value of c13 is larger than the node zc12If yes, then step (5-25), otherwise step (5-21);
(5-21) judging node zc13Whether it is not in the node set R and according to node zc12C13 th neighbor node zc13Active node zc12Is determined by the probability of node zc13Active node zc12If successful, node z is assignedc12Is NA (w)c2) Element of (5) is put into node zc13Activation node set NA (w)c3) In (1), node zc13Put into node set VnextAnd (5) turning to the step (5-22), otherwise, turning to the step (5-24);
(5-22) judging the node zc13Whether it is the first set of initial nodes SCIf d is an element in (1)CAs currDist, node zc13Joining node SetCThen, the step (5-23) is carried out, otherwise, the step (5-23) is directly carried out;
(5-23) judging the node zc13Whether it is an element in the second set of initial nodes S, and if so, dSAs currDist, node zc13Joining node SetSThen, the step (5-24) is carried out, otherwise, the step (5-24) is directly carried out;
(5-24) the counter c13 ═ c13+1, and return to step (5-20);
(5-25) the counter c12 ═ c12+1, and return to step (5-18);
(5-26) node set VnextAll the nodes in the node set V are replaced by all the nodes in the node set V, and the step (5-16) is returned;
(5-27) judgment of dCWhether the current value is equal to curDist, if yes, the step (5-28) is carried out, otherwise, the step (5-29) is carried out;
(5-28) judgment of dSWhether the current sampling rate is equal to curDist or not, if so, the sampling of the second initial node set S at this time is adopted
Figure GDA0002377762310000111
Then, the process proceeds to step (5-30), wherein
Figure GDA0002377762310000112
The size of the union set of the activated node sets of the nodes in the set S, otherwise the sampling of the second initial node set S is performed by adopting Ac10And then entering the step (5-30);
(5-29) judging whether the current sampling is equal to curDist, if so, adopting A for the current sampling by the second initial node set Sc10Turning to the step (5-30) if the value is 1, otherwise, directly turning to the step (5-30);
(5-30) returning to the sampling Ac10
According to another aspect of the present invention, there is provided an initial node selection system for maximizing influence, including:
a first module, configured to construct an influence graph according to the established propagation model, and generate a first initial node set S according to the constructed influence graphCFor propagating other competitive influences;
a second module for obtaining the threshold T by SSA algorithm according to the preset precision parameters epsilon and delta0Performing T on the influence graph obtained by the first module0Sampling is adopted in the second reverse direction, and T obtained by all sampling is used0Putting a sample into a sample set
Figure GDA0002377762310000113
Performing the following steps;
a third module for returning the sample set according to the second module
Figure GDA0002377762310000114
Iteratively selecting k nodes with the largest edge profit from the influence graph as initial node sets, and obtaining adopted biased estimates of a second initial node set S
Figure GDA0002377762310000121
Wherein 0<k<n;
A fourth module, setting the counter c10 to 0, for indicating the number of times of performing the inverse sampling in step (5), and setting the SUM of total adoption of the second initial node set S obtained by the third module2=0;
A fifth module, configured to perform inverse sampling on the influence graph, and calculate an adoption a obtained in this sampling by the second initial node set S obtained by the third modulec10And updating the SUM total adopted by the second initial node set S obtained by the third module2=SUM2+Ac10
A sixth module for determining whether counter c10 is less than threshold T2And SUM of total adoption of the second set of initial nodes S2Whether or not less than threshold value T3If yes, c10 is set to c10+1, and the method returns to the fifth module, otherwise, adopted biased estimation of the second initial node set S is output
Figure GDA0002377762310000122
Then entering a seventh module;
a seventh module for determining a biased estimation of the second initial node set S obtained by the third module
Figure GDA0002377762310000123
Unbiased estimation with the second set of initial nodes S obtained by the sixth module
Figure GDA0002377762310000124
Whether or not to satisfy
Figure GDA0002377762310000125
If yes, directly outputting a second initial node set S as a result, ending the process, otherwise, updating the sampling times T used by the second module0=2*T0And returns to the second module.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
(1) the crowd-sourced propagation model provided by the invention considers the competitive relationship of various influences on propagation, and introduces the crowd behavior into the process of adopting the influence for the first time as a decision basis, so that the rationality and the scientificity of the propagation model are enhanced, the obtained influence value is accurate, and the finally selected initial node is ideal enough;
(2) the invention aims at the reverse sampling method of the propagation model design of people consciousness, and solves the problem that the conventional reverse sampling method is not suitable for multi-influence sampling, thereby ensuring that the application range of the reverse sampling method is wider;
(3) the initial node selection method provided by the invention incrementally updates the adoption income of the nodes, thereby reducing a large amount of repeated calculation and improving the calculation efficiency.
Drawings
FIG. 1 is a flow chart of an initial node selection method of the present invention for impact maximization;
FIG. 2 is a state transition diagram of nodes in a crowd-aware propagation model;
FIG. 3 illustrates an example of a crowd-aware propagation model of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Aiming at the defects mentioned in the background technology, the invention firstly provides a crowd-aware propagation model, aiming at the propagation design of a plurality of competitive influences, the crowd behavior of people is introduced into the propagation process for selecting the finally adopted influence; then, based on a propagation model of the awareness of the people, the adoption maximization problem of the people is put forward; and finally, a sampling method and an initial node selection method are reversely adopted for the propagation model design of the people awareness, and an initial node set with quality assurance is calculated by combining an SSA framework.
The technical terms appearing in the present invention are explained and illustrated in detail below:
the method comprises the following steps: the term "public behavior" refers to the behavior of an individual influenced by the behavior of the external population, and the behavior of the individual in the perception, judgment and cognition is consistent with public opinion or most people.
The method comprises the following steps: refers to the expectation of the number of nodes that adopt some influence in the social network or the number of nodes that adopt some influence.
The adopted function of the following people: probability functions defined in a propagation model for multiple scenarios with competitive impact propagation in a social network at the same time
Figure GDA0002377762310000131
The adopted probability function of the subordinate people satisfies: h (u, I) is more than or equal to 0i) Less than or equal to 1 and
Figure GDA0002377762310000132
the probability parameter is provided for selecting and adopting one influence for the nodes in the social network, and the probability of adopting the influence is higher when the number of times the node receives the influence is larger, so that the behavior mode that the selection of the node is obeyed from people is reflected.
Entering a neighbor node: information propagation between nodes in a social network has directionality, and thus, edges between nodes in the influence graph also have directions. For any node v, if there is a directed edge (u, v) indicating that the node u points to the node v, then the node u is an in-neighbor node of the node v.
And (3) outputting a neighbor node: similarly, for any node u, if there is a directed edge (u, v) indicating that the node u points to the node v, then the node v is an out-neighbor node of the node u.
A crowd-aware propagation model: which specifies propagation rules for various influences in the influence graph. There are two behaviors per node: activation and adoption, the main controls of these two actions are the activation probability function p (e) and the adoption probability function h (u, I) of the subordinate.
The activation probability function p (e) is as in the present inventionThe method comprises three setting modes, namely unified probability setting, in-degree reciprocal weighting setting and one-out-of-three setting. The unified probability setting mode is that each edge in the network sets the same activation probability, which is usually 0.1. Setting the weighting of the inverse degree of the activation, and setting the activation probability of the edge as the inverse degree of the end point of the edge, namely p (u, v) is 1/| Nin(v) Where the edge e ═ v, v is the end point of the edge, Nin(v) Are all in-neighbors that point to the edge of v. The setting of one out of three is to randomly select an activation probability set as an edge from three values 0.1,0.01, 0.001.
The probability function adopted by the slave is defined as h (u, I) as follows:
Figure GDA0002377762310000141
wherein, h (u, I)i) Indicating that node u adopts IiProbability of (1), Freq (I)i) Indicating u received influence IiThe number of times.
The following formula is used in the actual calculation:
Figure GDA0002377762310000142
NA(u,Ii) Indicates that u is activated and I is propagated theretoiInto the neighbours.
The adopted probability function of the subordinate people satisfies: h (u, I) is more than or equal to 0i) Less than or equal to 1 and sigma1≤i≤|I|h(u,Ii)=1。
As shown in fig. 1, the initial node selection method for maximizing influence according to the present invention includes the following steps:
(1) constructing an Influence graph (infiluence graph) according to the established propagation model, and generating a first initial node set S according to the constructed Influence graphCFor propagating other competing influences (i.e., competing influences);
the social network is modeled as a directed influence graph G ═ V, E, p, where V denotes a set of nodes, E denotes a set of directed edges, p denotes a function of the pre-assigned activation probabilities of all edges, each edge having a pre-assigned activation probability p (u, V) E [0,1], representing the probability that node u activates node V.
For a first set of initial nodes SCIn other words, it is assumed that q is greater than or equal to 1 influence in the social network, each influence corresponds to one initial node set, and the initial node sets of different influences can overlap, that is, one node can serve as an initial node of multiple influences. First set of initial nodes SCThe real initial node setting can be obtained from a real network, and the setting can also be artificially set through a heuristic method (such as setting according to the degree of departure of the node). For example, when the number q of the contention impact is 2, 50 initial nodes are set for each contention impact, the 25 nodes with the largest out-degree may be selected as the initial nodes common to the two contention impacts, and then the nodes with the largest out-degree are sequentially allocated to the two impacts, and each node is allocated with one impact.
Fig. 2 shows three state changes of a node in the propagation process of the propagation model, wherein the node is initially in an inactivated state, and changes to an activated state if successfully activated by a neighbor node, and the node still keeps in the inactivated state if not successfully activated by the neighbor node. When the activated node receives one or more node propagated effects, it will select and adopt one of the effects, its state will become adopted, and remain until the propagation process is finished.
The propagation process of the propagation model is further explained below in conjunction with fig. 3. In fig. 3, circles represent nodes, lines with arrows represent directional edges, and numbers in the circles represent the effects received by this node. The circle sector indicates the probability of adopting the corresponding effect, the bold arrowed line segment, indicates the edge that activation is being attempted, and the dashed line indicates activation failure.
1) At time 0, the different influences activate the respective initial nodes to start the propagation process. The different influences may select the same node as the initial node.
In fig. 3, at time t equal to 0, three nodes a, b and c are activated as initial nodes, a and c are activated by influences 1 and 4, respectively, and node b is activated by three influences, namely influences 2, 3 and 4, respectively.
2) At time t, the node that has been activated at t-1 becomes activated at this point and attempts to activate the non-activated neighbor nodes, and if the activation attempt is successful, the activated node will receive all the effects of the activated node (the same non-activated node will likely be activated by multiple nodes in activated states, thus receiving multiple effects, and the same effect may also be received by the same node multiple times), and if the activation attempt is unsuccessful, the non-activated node remains in the non-activated state. After this, a node in the activated state will adopt one of its received influences according to the probability defined by the adopted function from the population, and finally become the adopted state.
Employment of function h (u, I) from numerous peoplei) The defined probabilities are as follows:
Figure GDA0002377762310000161
wherein, h (u, I)i) Representing node u adoption impact IiIs the set of all influence components, Freq (I)i) Indicating that node u receives an impact IiThe number of times. The following formula is used in the actual calculation:
Figure GDA0002377762310000162
NA(u,Ii) Indicating that node u is activated and to which impact I is propagatediOf the neighboring node.
The adopted probability function of the subordinate people satisfies: h (u, I) is more than or equal to 0i) Less than or equal to 1 and sigma1≤i≤|I|h(u,Ii) 1. In fig. 3, at time t equal to 1, three nodes a, b and c become activated, and attempt to activate three inactivated nodes d, e and f, respectively. Node c fails to activate e, node e is activated by both nodes a and b, receiving four effects of both nodes. Finally, the three nodes d, e and f will each select one of the impacts they receive. Wherein node e receives four influences, and each influence is received only once, so the probability of adoption is 1/4. In fig. 3, t is 2 because of the sectionPoint e is in the adopted state and node d cannot activate e. Node g is activated by three nodes at the same time, receiving four effects, but effects 1 and 4 are received twice, the rest once. Therefore, the probability of using the effects 1 and 4 is higher, 2/6, and the probability of using the remaining two effects is 1/6.
3) The propagation process terminates when no new nodes are activated.
In fig. 3, no new node can be activated after time t-2, and the propagation process ends.
(2) Obtaining a threshold value T by utilizing an SSA algorithm according to preset precision parameters epsilon and delta0Performing T on the influence map obtained in (1)0Sampling is adopted in the second reverse direction, and T obtained by all sampling is used0Putting a sample into a sample set
Figure GDA0002377762310000171
Wherein, 0 < epsilon < 1,0 < delta < 1;
specifically, the threshold value T0Is calculated by the formula
Figure GDA0002377762310000172
Wherein the function
Figure GDA0002377762310000173
n represents the total number of nodes in the impact graph,
in this step, the sample refers to data obtained by sampling in a single reverse direction, and specifically includes: the node set R, the set of distances between each node w and other nodes in the node set R { d (u) | u ∈ R }, the set of activated node sets of each node in the node set R { NA (u) | u ∈ R }, the node w and the first initial node set S }CDistance d ofCAnd node w is set by the first initial node SCTotal number of activation times Time of the intermediate nodesC
The process of sampling reversely adopted in the step comprises the following substeps:
(2-1) randomly selecting a node w from the influence graph, and setting the distance d (w) between the node w and the node wc1) Put w into node set when equal to 0R, and setting an initial node set SCDistance d from node wCSetting a variable curDist to be 0, wherein curDist represents the distance between the currently processed node and the node w, and is called the current distance for short;
(2-2) judging whether the node w is the first initial node set SCIf yes, recording the total Time of activation of node w by competition influenceCAnd an initial set of nodes SCDistance d from node wcIf not, the step (2-17) is carried out, otherwise, the step (2-3) is carried out;
(2-3) setting a counter c1 to be 1, judging whether the current distance curDist is curDist +1, judging whether the value c1 is larger than the total number of the neighbor nodes of the node w, if so, turning to the step (2-17), otherwise, turning to the step (2-4);
(2-4) according to the c1 th entry neighbor node w of the node wc1Probability decision node w of activation node wc1Whether the node w is successfully activated or not is judged, and if yes, the node w is recordedc1Distance d (w) from node wc1) As curDist, node wc1Put into its active node set NA (w)c1) Node wc1Putting the nodes into a node set V, and turning to the step (2-5), otherwise, turning to the step (2-6);
(2-5) judging the node wc1Whether it is the first set of initial nodes SCIf yes, the initial node is collected SCDistance d from node wCSetting the current distance curDist, and turning to the step (2-6), otherwise, turning to the step (2-6);
(2-6) judging whether the value of c1 is equal to the total number of the neighbor nodes of the node w, if so, switching to the step (2-8), otherwise, switching to the step (2-7);
(2-7) setting the counter c1 ═ c1+1, and returning to step (2-4);
(2-8) judging the node set V and the first initial node set SCWhether an intersection exists or whether the node set V is an empty set, if so, calculating and recording the node w by the node set V and the node set SCTotal Time of activation affected by competition propagated by nodes in the intersectionCAnd the step (2-17) is carried out, otherwise, all the nodes in the node set V are put intoNode set R and the step (2-9) is carried out;
(2-9) setting counter c2 ═ 1, and current distance curDist ═ curDist + 1;
(2-10) selecting the c2 th node w from the node set Vc2And setting the counter c3 to 1, and judging whether the value of the counter c3 is larger than the node wc2If yes, then step (2-14) is carried out, otherwise step (2-11) is carried out;
(2-11) judging the node wc3Whether it is not in the node set R and according to the node wc2C3 th neighbor node wc3Activation node wc2Is determined by the probability of node wc3Activation node wc2If successful, set node wc3Distance d (w) from node wc3) Equal to curDist, node wc2Is NA (w)c2) Element of (5) is put into node wc3Activation node set NA (w)c3) In (1), node wc3Put into node set VnextAnd (5) turning to the step (2-12), otherwise, turning to the step (2-13);
(2-12) judging the node wc3Whether it is the first set of initial nodes SCIf yes, setting an initial node set SCDistance d from node wCEqual to curDist, and the step (2-13) is carried out, otherwise, the step (2-13) is directly carried out;
(2-13) judging whether the value of the counter c3 is larger than the node wc2If yes, the step (2-14) is carried out, otherwise, a counter c3 is set to be c3+1, and the step (2-11) is returned;
(2-14) judging whether the value of the counter c2 is equal to the total number of the nodes in the node set V, if so, entering the step (2-15), otherwise, setting the counter c2 to c2+1, and returning to the step (2-10);
(2-15) judging the node set VnextWith a first set of initial nodes SCWhether there is an intersection, or a set of nodes VnextWhether the node is an empty set or not, if so, acquiring a node w and collecting a node set VnextAnd SCTotal Time of activation affected by competition propagated by nodes in the intersectionCAnd then the step (2-17) is carried out, otherwise, the step (2-16) is carried out;
(2-16) node set VnextReplacing all nodes in the node set V by all nodes in the node set V, putting all nodes in the node set V into the node set R, and returning to the step (2-8);
(2-17) acquiring a node set R, a set { d (u) | u ∈ R } of the distance between each node in the node set R and the node w, a set { NA (u) | u ∈ R } of an activated node set of each node in the node set R, the node w and a first initial node set SCDistance d ofCAnd node w is set by the first initial node set SCTotal number of activation times Time of the intermediate nodesC
(3) According to the sample set returned in the step (2)
Figure GDA0002377762310000192
Iteratively selecting k nodes with the largest edge profit from the influence graph as initial node sets, and obtaining adopted biased estimates of a second initial node set S
Figure GDA0002377762310000193
Wherein 0<k<n;
Because the influence graph is a probability graph, activation and adoption are random, so the number of nodes adopting some kind of influence is also random. Thus, in probabilistic directed network propagation of multiple influences, the expectation is defined that the function f (S) be employed as the number of nodes employing some influence by the set of nodes S as the initial set of nodes. Since calculating the sampling f (S) of the second set of initial nodes S is an NP-Hard problem, using the sample estimates sampled in step (2) to estimate the sampling f (S), the calculation formula is as follows:
Figure GDA0002377762310000191
wherein S is an initial node set, n is the total number of nodes in the network, AiIs the revenue of the sampling at the ith time,
Figure GDA0002377762310000201
is the number of actual samples and is,
Figure GDA0002377762310000202
is that
Figure GDA0002377762310000203
Average of all samples taken.
The profit of the initial node can be measured more reasonably and scientifically by adopting the expectation of the number of the nodes.
In this step, the specific process of iteratively selecting k nodes with the maximum edge revenue from the impact graph as the initial node set is as follows:
(3-1) setting
Figure GDA0002377762310000204
Setting the counter c4 to 1, and setting each node in the influence graph in the sample set
Figure GDA0002377762310000205
All the initial values adopted in (1) are 0, and the second initial node set S is in the sample set
Figure GDA0002377762310000206
SUM of adopted SUM of above1=0;
(3-2) setting counter c5 ═ 1;
(3-3) from the sample set
Figure GDA0002377762310000207
Taking out the c5 th sample Rc5Setting counter c6 to 1;
(3-4) from the sample Rc5Take out the c6 th node wc6Is judged as Rc5Middle node wc6Distance d (w) from node wc6) Whether or not less than dCIf yes, set node wc6At Rc5The above is adopted as 1, node wc6Adding 1 to the total adoption of the method, and then turning to the step (3-5); otherwise, set node wc6At Rc5The adoption of is
Figure GDA0002377762310000208
Updating wc6In a sample set
Figure GDA0002377762310000209
Is the node wc6In a sample set
Figure GDA00023777623100002010
Total adoption of (1) plus
Figure GDA00023777623100002011
Then, the step (3-5) is carried out; wherein NA (w)c6) And TimeCAre respectively representative of samples Rc5Active node set and first initial node set S in (1)CImpact activation samples R propagated by middle nodesc5The total number of nodes w in (d).
(3-5) judging whether the counter c6 is smaller than | Rc5If yes, setting c6 to c6+1, then returning to the step (3-4), otherwise, turning to the step (3-6);
(3-6) judging whether or not the counter c5 is smaller than
Figure GDA00023777623100002012
If yes, c5 is set to c5+1, the step (3-3) is returned, and otherwise, the step (3-7) is carried out;
(3-7) utilizing the large root heap to influence all nodes in the graph according to each node
Figure GDA00023777623100002013
The total adopted sizes of the above are arranged in descending order;
(3-8) judgment counter C4Whether the k is less than a preset threshold value k, if so, taking out the k
Figure GDA00023777623100002014
Node w of which the total number of nodes is the maximumc4Node wc4Adding a second initial node set S, setting SUM1=SUM1+A(wc4) Wherein A (w)c4) Is node wc4In that
Figure GDA0002377762310000211
And remove node w from the big root heapc4And going to step (3-9), otherwise obtaining the initial node set S and the biased estimation adopted by the initial node set S
Figure GDA0002377762310000212
Then, the process goes to the step (3-22).
(3-9) from the sample set
Figure GDA0002377762310000213
Take out all contained nodes wc4Forming a sample set
Figure GDA0002377762310000214
And sets counter c7 to 1;
(3-10) taking out
Figure GDA0002377762310000215
C7 th sample R in (1)c7And judging the sample Rc7Whether the mark is used for determining adoption, if yes, going to the step (3-21), otherwise, going to the step (3-11);
(3-11) determination of wc4And sample Rc7Distance d (w) of node wc4) Whether or not less than Rc7Initial node set S in (1)CDistance d from node wCIf so, the sample R is markedc7If the determination is adopted, the step (3-12) is carried out, otherwise, the step (3-15) is carried out;
(3-12) setting counter c8 to 1;
(3-13) taking out sample Rc7C8 th node in (1), update wc8In a sample set
Figure GDA0002377762310000216
Is the node wc8In a sample set
Figure GDA0002377762310000217
Total sum of above minus node wc8At Rc7And update the node wc8Sorting in a large heap.
(3-14) judging whether or not the counter c8 is smaller than | Rc7If yes, setting c8 to c8+1, and going to the step (3-13), otherwise, going to the step (3-21);
(3-15) setting counter c9 to 1;
(3-16) taking out sample Rc7C9 th node w inc9Judgment of wc9And sample Rc7Distance d (w) of node wc9) Whether or not less than Rc7Initial node set S in (1)CDistance d from node wCIf yes, going to the step (3-17), otherwise, going to the step (3-18);
(3-17) update node wc9At Rc7Is adopted as a node wc9At Rc7Node w is subtracted from the abovec4At Rc7By adopting the above method, the node w is updatedc9In a sample set
Figure GDA0002377762310000218
Is the node wc9In a sample set
Figure GDA0002377762310000219
Total sum of above minus node wc4At Rc7The above method is adopted.
(3-18) updating node wc9In a sample set
Figure GDA00023777623100002110
Is the node wc9In a sample set
Figure GDA00023777623100002111
Total sum of above minus node wc9At Rc7By adopting the above method, the node w is updatedc9At Rc7The adoption of is
Figure GDA0002377762310000221
Wherein
Figure GDA0002377762310000222
The size of the union of the active node sets for the nodes in set S, i.e., the second set of initial nodes S at Rc7Total number of activations on and updates node w againc9In a sample set
Figure GDA0002377762310000223
Is the node wc9In a sample set
Figure GDA0002377762310000224
Total adoption of (1) plus node wc9At Rc7And adopting after updating.
(3-19) update the ordering of node wc9 in the big root heap.
(3-20) judging whether the counter c9 is less than | Rc7If yes, setting c9 to c9+1, and going to the step (3-16), otherwise, going to the step (3-21);
(3-21) judging whether or not the counter c7 is smaller than
Figure GDA0002377762310000225
If yes, setting c7 to c7+1, and going to the step (3-10), otherwise, going to the step (3-8);
(3-22) judgment of SUM1Whether or not it is greater than threshold value T1If yes, entering the step (4), otherwise updating the sampling times T used in the step (2)0=2*T0And returns to (2). According to the SSA algorithm, the threshold T1Is calculated by the formula
Figure GDA0002377762310000226
Wherein
Figure GDA0002377762310000227
ε2=ε3=(1-1/e)-1ε/2, the remaining variables are the same as in step (2).
Since the function f (S) is adopted, the method has nonnegativity, monotone nonreducibility and submodulity. Therefore, the sampling of S calculated using a greedy strategy
Figure GDA0002377762310000228
Wherein
Figure GDA0002377762310000229
Is in a sample set
Figure GDA00023777623100002210
Upper optimal set of initial nodes S+Sampling of (3).
In the above process, the samples are respectively set
Figure GDA00023777623100002214
All the nodes u in the sample R are adopted for the sample R calculation, and then the adoption of each node in different samples R is accumulated to obtain the sample set of each node in the influence graph
Figure GDA00023777623100002211
The above general formulae (3-1 to 3-6). Then according to the node sample set
Figure GDA00023777623100002212
The method comprises the steps of establishing a large root heap according to the total adoption of nodes in a sample set
Figure GDA00023777623100002213
The total adopted size is arranged in descending order, and the heap top is the node with the maximum total adopted value. And selecting the heap top node, namely adopting the maximum node as a new initial node. Due to the overlapping of the ranges of use between nodes, the use of nodes in the same sample R as the selected new initial node will be reduced, requiring the benefit of these nodes to be updated and the position of these nodes in the large root heap to be adjusted. And circularly selecting the heap top node and updating the profits of other nodes until the number of the nodes is k (3-7 to 3-21).
(4) Setting a counter c10 equal to 0, which indicates the number of times of performing the inverse sampling of step (5), and setting the SUM of total adoption of the second initial node set S obtained in step (3)2=0;
(5) Influence graphSampling is reversely adopted, and the adoption A obtained in the sampling of the second initial node set S obtained in the step (3) is calculatedc10And updating the total adopted SUM of the second initial node set S obtained in the step (3)2=SUM2+Ac10
In the step, sampling is reversely adopted for the influence graph, and an adoption A obtained in the current sampling of the second initial node set S obtained in the step (3) is calculatedc10Comprises the following substeps:
(5-1) randomly selecting a node z from the influence graph, placing the node z in a node set R, a first initial node set SCDistance d from node zCInfinity, distance d of the second set of initial nodes S from node zSInfinity, the current distance curDist is 0, and the sampling of the second initial node set S at this time is ac10=0;
(5-2) determining whether the node z is the first initial node set SCIf d is an element in (1)CIf so, switching to the step (5-3), otherwise, switching to the step (5-3);
(5-3) judging whether the node z is an element in the second initial node set S, if so, dSIf so, switching to the step (5-4), otherwise, switching to the step (5-4);
(5-4) judgment of dCWhether the current value is equal to curDist or not, if so, turning to the step (5-5), otherwise, entering the step (5-6);
(5-5) judgment of dSWhether the current sampling rate is equal to curDist or not, if so, the sampling of the second initial node set S at this time is adopted
Figure GDA0002377762310000231
Wherein TimeCThe total number of times that the node z is activated by the influence of competition is calculated, and the step (5-30) is carried out, otherwise, the sampling A of the second initial node set S at the time is adoptedc10Entering step (5-30) when the value is 0;
(5-6) judging whether the current sampling rate is equal to curDist, if so, adopting A for the current sampling of the second initial node set Sc10If the value is 1, the step (5-30) is carried out, otherwise, the step (5-7) is carried out;
(5-7) setting counter c11 ═ 1, current distance curDist ═ curDist + 1;
(5-8) judging whether the value of c11 is larger than the total number of the neighbor nodes of the node z, if so, turning to the step (5-13), otherwise, turning to the step (5-9);
(5-9) the c11 th entry neighbor node z according to the node zc11Probabilistic decision node z of activation node zc11If the activation of the node z is successful, if so, the node z is activatedc11Put into its active node set NA (z)c11) Node z is to bec11Putting the nodes into a node set V, and turning to the step (5-10), otherwise, turning to the step (5-11);
(5-10) judging the node zc11Whether it is the first set of initial nodes SCIf d is an element in (1)CAs currDist, node zc11Joining node SetCAnd (5) turning to the step (5-11), otherwise, turning to the step (5-11);
(5-11) judging the node zc11Whether it is an element in the second set of initial nodes S, and if so, dSAs currDist, node zc11Joining node SetSAnd (5) turning to the step (5-12), otherwise, turning to the step (5-12);
(5-12) entering a step (5-8) when the counter c11 is c11+ 1;
(5-13) judgment of dCWhether the current value is equal to curDist or not, if yes, the step (5-14) is carried out, and otherwise, the step (5-15) is carried out;
(5-14) judgment of dSWhether the current sampling rate is equal to curDist or not, if so, the sampling of the second initial node set S at this time is adopted
Figure GDA0002377762310000241
Wherein
Figure GDA0002377762310000242
Is the total number of times node z is activated by the competing influence, InfsetC(u) is the Set of nodesCAnd (5) collecting competition influence propagated by the middle node, and turning to the step (5-30), otherwise, adopting A for the sampling of the second initial node set S at the timec10Entering step (5-30) when the value is 0;
(5-15) judgmentdSWhether the current sampling rate is equal to curDist or not, if so, the sampling A of the second initial node set S at this time is adoptedc10If the value is 1, the step (5-30) is carried out, otherwise, the step (5-16) is carried out;
(5-16) if the node set V is not an empty set, and dSIs not equal to curDist, and dCIf the sum is not equal to curDist, putting all the nodes in the node set V into the node set R, and turning to the step (5-17), otherwise, turning to the step (5-27);
(5-17) setting counter c12 ═ 1, current distance curDist from node z ═ curDist + 1;
(5-18) judging whether the value of c12 is larger than the total number of nodes in the node set V, if so, entering the step (5-26), otherwise, entering the step (5-19);
(5-19) selecting a c12 th node z from the node set Vc12And sets counter c13 to 1;
(5-20) judging whether the value of c13 is larger than the node zc12If yes, then step (5-25), otherwise step (5-21);
(5-21) judging node zc13Whether it is not in the node set R and according to node zc12C13 th neighbor node zc13Active node zc12Is determined by the probability of node zc13Active node zc12If successful, node z is assignedc12Is NA (w)c2) Element of (5) is put into node zc13Activation node set NA (w)c3) In (1), node zc13Put into node set VnextAnd (5) turning to the step (5-22), otherwise, turning to the step (5-24);
(5-22) judging the node zc13Whether it is the first set of initial nodes SCIf d is an element in (1)CAs currDist, node zc13Joining node SetCThen, the step (5-23) is carried out, otherwise, the step (5-23) is directly carried out;
(5-23) judging the node zc13Whether it is an element in the second set of initial nodes S, and if so, dSAs currDist, node zc13Joining node SetSThen, the step (5-24) is carried out, otherwise, the step (5-24) is directly carried out;
(5-24) the counter c13 ═ c13+1, and return to step (5-20);
(5-25) the counter c12 ═ c12+1, and return to step (5-18);
(5-26) node set VnextAll the nodes in the node set V are replaced by all the nodes in the node set V, and the step (5-16) is returned;
(5-27) judgment of dCWhether the current value is equal to curDist, if yes, the step (5-28) is carried out, otherwise, the step (5-29) is carried out;
(5-28) judgment of dSWhether the current sampling rate is equal to curDist or not, if so, the sampling of the second initial node set S at this time is adopted
Figure GDA0002377762310000261
Then, the process proceeds to step (5-30), wherein
Figure GDA0002377762310000262
The size of the union set of the activated node sets of the nodes in the set S, otherwise the sampling of the second initial node set S is performed by adopting Ac10And then entering the step (5-30);
(5-29) judging whether the current sampling is equal to curDist, if so, adopting A for the current sampling by the second initial node set Sc10Turning to the step (5-30) if the value is 1, otherwise, directly turning to the step (5-30);
(5-30) returning to the sampling Ac10
(6) Determine whether counter c10 is less than threshold T2And SUM of total adoption of the second set of initial nodes S2Whether or not less than threshold value T3If yes, c10 is set to c10+1, and the step (5) is returned, otherwise, the output is output
Figure GDA0002377762310000263
And entering step (7);
according to the SSA algorithm, the threshold T2Is calculated by the formula
Figure GDA0002377762310000264
Threshold value T3Is calculated by the formula
Figure GDA0002377762310000265
The variables in the formula are exactly the same as described in steps (2) and (3-22).
(7) Judging the biased estimation of the second initial node set S obtained in the step (3)
Figure GDA0002377762310000266
Unbiased estimation with the second set of initial nodes S obtained in step (6)
Figure GDA0002377762310000267
Whether or not to satisfy
Figure GDA0002377762310000268
If yes, directly outputting a second initial node set S as a result, ending the process, and otherwise, updating the sampling times T used in the step (2)0=2*T0And returning to the step (2).
In summary, the crowd-sourced propagation model provided by the invention considers the competitive relationship of various influences on propagation, and introduces the crowd-sourced behavior into the influence adoption process for the first time as a decision basis, so that the rationality and the scientificity of the propagation model are enhanced. The sampling method is adopted in the reverse direction of the propagation model design of the people consciousness, the problem that the propagated reverse sampling method is not suitable for multi-influence sampling is solved, and the application range of the method is widened. The initial node selection method incrementally updates the adoption income of the nodes, thereby reducing a large amount of repeated calculation and improving the calculation efficiency.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (9)

1. An initial node selection method for realizing influence maximization is characterized by comprising the following steps:
(1) constructing an influence graph according to the established propagation model, and generating a first initial node set S according to the constructed influence graphCFor propagating other competitive influences;
(2) obtaining a threshold value T by using a stop-check SSA algorithm according to preset precision parameters epsilon and delta0Performing T on the influence map obtained in (1)0Sampling is adopted in the second reverse direction, and T obtained by all sampling is used0Putting a sample into a sample set
Figure FDA0002402710200000011
Performing the following steps; wherein the threshold value T0Is calculated by the formula
Figure FDA0002402710200000012
Wherein the function
Figure FDA0002402710200000013
Figure FDA0002402710200000014
n represents the total number of nodes in the impact graph;
(3) according to the sample set returned in the step (2)
Figure FDA0002402710200000015
Iteratively selecting k nodes with the largest edge revenue from the influence graph as a second initial node set S, and obtaining adopted biased estimates of the second initial node set S
Figure FDA0002402710200000016
Wherein 0<k<n;
(4) Setting the counter c10 to 0, and setting the SUM total adopted for the second initial node set S obtained in step (3)2=0;
(5) Sampling the influence graph reversely, and calculating the adoption A obtained in the sampling of the second initial node set S obtained in the step (3) in the current samplingc10And updating the value obtained in the step (3)SUM of total adoption of the second set of initial nodes S2=SUM2+Ac10
(6) Determine whether counter c10 is less than threshold T2And SUM of total adoption of the second set of initial nodes S2Whether or not less than threshold value T3If yes, c10 is set to c10+1, the step (5) is returned, and otherwise, unbiased estimation is output
Figure FDA0002402710200000017
And entering step (7), wherein n represents the total number of nodes in the influence graph;
(7) judging the biased estimation of the second initial node set S obtained in the step (3)
Figure FDA0002402710200000021
Unbiased estimation with the second set of initial nodes S obtained in step (6)
Figure FDA0002402710200000022
Whether or not to satisfy
Figure FDA0002402710200000023
If yes, directly outputting a second initial node set S as a result, ending the process, otherwise updating the sampling times T used in the step (2)0=2*T0And returning to step (2) wherein epsilon1Representing the weight.
2. The initial node selection method according to claim 1, wherein the influence graph is represented as G ═ V, E, p, where V denotes a set of nodes, E denotes a set of directed edges, p denotes a function of pre-assigned activation probabilities of all edges, each edge has a pre-assigned activation probability p (u, V) ∈ [0,1], and denotes a probability that node u activates node V.
3. The initial node selection method of claim 1, wherein the propagation process of the propagation model is as follows:
firstly, at the time 0, different influences activate respective initial nodes to start a propagation process; different influences can select the same node as an initial node;
then, at time t, the node that has been activated at time t-1 becomes activated at this time and attempts to activate the neighbor nodes that are not activated, if the activation attempt is successful, the activated node will receive all the effects of the activated node, if the activation attempt is unsuccessful, the nodes that are not activated remain in the inactivated state, after which the node in the activated state will adopt one of its received effects according to the probability defined by the adoption function of the slave and finally becomes the adopted state;
employment of function h (u, I) from numerous peoplei) The defined probabilities are as follows:
Figure FDA0002402710200000024
wherein, h (u, I)i) Representing node u adoption impact IiIs the set of all influence components, NA (u, I)i) Indicating that node u is activated and to which impact I is propagatediThe neighbor node of (2);
finally, the propagation process terminates when no new nodes are activated.
4. The initial node selection method of claim 1,
t in step (2)0The samples are data obtained by sampling in a single reverse direction, and comprise a node set R, a set of distances between each node w and other nodes in the node set R { d (u) | u ∈ R }, a set of activated node sets of each node in the node set R { NA (u) | u ∈ R }, a node w and a first initial node set SCDistance d ofCAnd node w is set by the first initial node SCTotal number of activation times Time of the intermediate nodesC
5. The method for selecting an initial node according to any one of claims 1 to 4, wherein the process of sampling in a single inverse direction in step (2) comprises the following sub-steps:
(2-1) randomly selecting a node w from the influence graph, and setting the distance d (w) between the node w and the node wc1) Put w into node set R and set initial node set S as 0CDistance d from node wCSetting a variable curDist to be 0, wherein curDist represents the distance between the currently processed node and the node w, and is called the current distance for short;
(2-2) judging whether the node w is the first initial node set SCIf yes, recording the total Time of activation of node w by competition influenceCAnd an initial set of nodes SCDistance d from node wcIf not, the step (2-17) is carried out, otherwise, the step (2-3) is carried out;
(2-3) setting a counter c1 to be 1, judging whether the current distance curDist is curDist +1, judging whether the value c1 is larger than the total number of the neighbor nodes of the node w, if so, turning to the step (2-17), otherwise, turning to the step (2-4);
(2-4) according to the c1 th entry neighbor node w of the node wc1Probability decision node w of activation node wc1Whether the node w is successfully activated or not is judged, and if yes, the node w is recordedc1Distance d (w) from node wc1) As curDist, node wc1Put into its active node set NA (w)c1) Node wc1Putting the nodes into a node set V, and turning to the step (2-5), otherwise, turning to the step (2-6);
(2-5) judging the node wc1Whether it is the first set of initial nodes SCIf yes, the initial node is collected SCDistance d from node wCSetting the current distance curDist, and turning to the step (2-6), otherwise, turning to the step (2-6);
(2-6) judging whether the value of c1 is equal to the total number of the neighbor nodes of the node w, if so, switching to the step (2-8), otherwise, switching to the step (2-7);
(2-7) setting the counter c1 ═ c1+1, and returning to step (2-4);
(2-8) judging the node set V and the first initial node set SCWhether there is an intersection or whether the node set V is an empty set, if soIf yes, calculating and recording node w by node set V and SCTotal Time of activation affected by competition propagated by nodes in the intersectionCAnd go to step (2-17), otherwise put all nodes in the node set V into the node set R, and go to step (2-9);
(2-9) setting counter c2 ═ 1, and current distance curDist ═ curDist + 1;
(2-10) selecting the c2 th node w from the node set Vc2And setting the counter c3 to 1, and judging whether the value of the counter c3 is larger than the node wc2If yes, then step (2-14) is carried out, otherwise step (2-11) is carried out;
(2-11) judging the node wc3Whether it is not in the node set R and according to the node wc2C3 th neighbor node wc3Activation node wc2Is determined by the probability of node wc3Activation node wc2If successful, set node wc3Distance d (w) from node wc3) Equal to curDist, node wc2Is NA (w)c2) Element of (5) is put into node wc3Activation node set NA (w)c3) In (1), node wc3Put into node set VnextAnd (5) turning to the step (2-12), otherwise, turning to the step (2-13);
(2-12) judging the node wc3Whether it is the first set of initial nodes SCIf yes, setting an initial node set SCDistance d from node wCEqual to curDist, and the step (2-13) is carried out, otherwise, the step (2-13) is directly carried out;
(2-13) judging whether the value of the counter c3 is larger than the node wc2If yes, the step (2-14) is carried out, otherwise, a counter c3 is set to be c3+1, and the step (2-11) is returned;
(2-14) judging whether the value of the counter c2 is equal to the total number of the nodes in the node set V, if so, entering the step (2-15), otherwise, setting the counter c2 to c2+1, and returning to the step (2-10);
(2-15) judging the node set VnextWith a first set of initial nodes SCWhether there is an intersection or notOr a set of nodes VnextWhether the node is an empty set or not, if so, acquiring a node w and collecting a node set VnextAnd SCTotal Time of activation affected by competition propagated by nodes in the intersectionCAnd then the step (2-17) is carried out, otherwise, the step (2-16) is carried out;
(2-16) node set VnextReplacing all nodes in the node set V by all nodes in the node set V, putting all nodes in the node set V into the node set R, and returning to the step (2-8);
(2-17) acquiring a node set R, a set { d (u) | u ∈ R } of the distance between each node in the node set R and the node w, a set { NA (u) | u ∈ R } of an activated node set of each node in the node set R, the node w and a first initial node set SCDistance d ofCAnd node w is set by the first initial node set SCTotal number of activation times Time of the intermediate nodesC
6. The initial node selection method according to claim 5, wherein the step (3) of iteratively selecting k nodes with the largest edge profit as the initial node set comprises the following substeps:
(3-1) setting
Figure FDA0002402710200000051
Setting the counter c4 to 1, and setting each node in the influence graph in the sample set
Figure FDA0002402710200000052
All the initial values adopted in (1) are 0, and the second initial node set S is in the sample set
Figure FDA0002402710200000053
SUM of adopted SUM of above1=0;
(3-2) setting counter c5 ═ 1;
(3-3) from the sample set
Figure FDA0002402710200000054
Taking out the c5 th sample Rc5Setting counter c6 to 1;
(3-4) from the sample Rc5Take out the c6 th node wc6Is judged as Rc5Middle node wc6Distance d (w) from node wc6) Whether or not less than dCIf yes, set node wc6At Rc5The above is adopted as 1, node wc6Adding 1 to the total adoption of the method, and then turning to the step (3-5); otherwise, set node wc6At Rc5The adoption of is
Figure FDA0002402710200000055
Updating wc6In a sample set
Figure FDA0002402710200000056
Is the node wc6In a sample set
Figure FDA0002402710200000057
Total adoption of (1) plus
Figure FDA0002402710200000058
Then, the step (3-5) is carried out; wherein NA (w)c6) And TimeCAre respectively representative of samples Rc5Active node set and first initial node set S in (1)CImpact activation samples R propagated by middle nodesc5The total number of nodes w in (1);
(3-5) judging whether the counter c6 is smaller than | Rc5If yes, setting c6 to c6+1, then returning to the step (3-4), otherwise, turning to the step (3-6);
(3-6) judging whether or not the counter c5 is smaller than
Figure FDA0002402710200000061
If yes, c5 is set to c5+1, the step (3-3) is returned, and otherwise, the step (3-7) is carried out;
(3-7) utilizing the large root heap to influence all nodes in the graph according to each node
Figure FDA0002402710200000062
The total adopted sizes of the above are arranged in descending order;
(3-8) judgment counter C4Whether the k is less than a preset threshold value k, if so, taking out the k
Figure FDA0002402710200000063
Node w of which the total number of nodes is the maximumc4Node wc4Adding a second initial node set S, setting SUM1=SUM1+A(wc4) Wherein A (w)c4) Is node wc4In that
Figure FDA0002402710200000064
And remove node w from the big root heapc4And going to step (3-9), otherwise obtaining the initial node set S and the biased estimation adopted by the initial node set S
Figure FDA0002402710200000065
Then, the step (3-22) is carried out;
(3-9) from the sample set
Figure FDA0002402710200000066
Take out all contained nodes wc4Forming a sample set
Figure FDA0002402710200000067
And sets counter c7 to 1;
(3-10) taking out
Figure FDA0002402710200000068
C7 th sample R in (1)c7And judging the sample Rc7Whether the mark is used for determining adoption, if yes, going to the step (3-21), otherwise, going to the step (3-11);
(3-11) determination of wc4And sample Rc7Distance d (w) of node wc4) Whether or not less than Rc7Initial node set S in (1)CHejie (Chinese character)Distance d of point wCIf so, the sample R is markedc7If the determination is adopted, the step (3-12) is carried out, otherwise, the step (3-15) is carried out;
(3-12) setting counter c8 to 1;
(3-13) taking out sample Rc7C8 th node in (1), update wc8In a sample set
Figure FDA0002402710200000069
Is the node wc8In a sample set
Figure FDA00024027102000000610
Total sum of above minus node wc8At Rc7And update the node wc8Sorting in a big root heap;
(3-14) judging whether or not the counter c8 is smaller than | Rc7If yes, setting c8 to c8+1, and going to the step (3-13), otherwise, going to the step (3-21);
(3-15) setting counter c9 to 1;
(3-16) taking out sample Rc7C9 th node w inc9Judgment of wc9And sample Rc7Distance d (w) of node wc9) Whether or not less than Rc7Initial node set S in (1)CDistance d from node wCIf yes, going to the step (3-17), otherwise, going to the step (3-18);
(3-17) update node wc9At Rc7Is adopted as a node wc9At Rc7Node w is subtracted from the abovec4At Rc7By adopting the above method, the node w is updatedc9In a sample set
Figure FDA0002402710200000071
Is the node wc9In a sample set
Figure FDA0002402710200000072
Total sum of above minus node wc4At Rc7The above steps are adopted;
(3-18) updating node wc9In a sample set
Figure FDA0002402710200000073
Is the node wc9In a sample set
Figure FDA0002402710200000074
Total sum of above minus node wc9At Rc7By adopting the above method, the node w is updatedc9At Rc7The adoption of is
Figure FDA0002402710200000075
Wherein
Figure FDA0002402710200000076
The size of the union of the active node sets for the nodes in set S, i.e., the second set of initial nodes S at Rc7Total number of activations on and updates node w againc9In a sample set
Figure FDA0002402710200000077
Is the node wc9In a sample set
Figure FDA0002402710200000078
Total adoption of (1) plus node wc9At Rc7Adopting after updating;
(3-19) updating the ordering of node wc9 in the big root heap;
(3-20) judging whether the counter c9 is less than | Rc7If yes, setting c9 to c9+1, and going to the step (3-16), otherwise, going to the step (3-21);
(3-21) judging whether or not the counter c7 is smaller than
Figure FDA0002402710200000079
If yes, setting c7 to c7+1, and going to the step (3-10), otherwise, going to the step (3-8);
(3-22) judgment of SUM1Whether or not it is greater than threshold value T1If yes, entering the step (4), otherwise updating the sampling times T used in the step (2)0=2*T0And returns to (2).
7. The method for selecting initial nodes according to claim 6, wherein the f (S) adopted by the second initial node set S is obtained by adopting the following formula:
Figure FDA00024027102000000710
wherein S is an initial node set, n is the total number of nodes in the network, AiIs the revenue of the sampling at the ith time,
Figure FDA00024027102000000711
is the number of actual samples and is,
Figure FDA00024027102000000712
is that
Figure FDA00024027102000000713
Average of all samples taken.
8. The initial node selection method according to claim 7, wherein the influence graph is reversely sampled in step (5), and the sampling A obtained in the current sampling by the second initial node set S obtained in step (3) is calculatedc10Comprises the following substeps:
(5-1) randomly selecting a node z from the influence graph, placing the node z in a node set R, a first initial node set SCDistance d from node zCInfinity, distance d of the second set of initial nodes S from node zSInfinity, the current distance curDist is 0, and the sampling of the second initial node set S at this time is ac10=0;
(5-2) determining whether the node z is the first initial node set SCIf d is an element in (1)C=curDist,Turning to the step (5-3), otherwise, turning to the step (5-3);
(5-3) judging whether the node z is an element in the second initial node set S, if so, dSIf so, switching to the step (5-4), otherwise, switching to the step (5-4);
(5-4) judgment of dCWhether the current value is equal to curDist or not, if so, turning to the step (5-5), otherwise, entering the step (5-6);
(5-5) judgment of dSWhether the current sampling rate is equal to curDist or not, if so, the sampling of the second initial node set S at this time is adopted
Figure FDA0002402710200000081
Wherein TimeCThe total number of times that the node z is activated by the influence of competition is calculated, and the step (5-30) is carried out, otherwise, the sampling A of the second initial node set S at the time is adoptedc10Entering step (5-30) when the value is 0;
(5-6) judgment of dSWhether the current sampling rate is equal to curDist or not, if so, the sampling A of the second initial node set S at this time is adoptedc10If the value is 1, the step (5-30) is carried out, otherwise, the step (5-7) is carried out;
(5-7) setting counter c11 ═ 1, current distance curDist ═ curDist + 1;
(5-8) judging whether the value of c11 is larger than the total number of the neighbor nodes of the node z, if so, turning to the step (5-13), otherwise, turning to the step (5-9);
(5-9) the c11 th entry neighbor node z according to the node zc11Probabilistic decision node z of activation node zc11If the activation of the node z is successful, if so, the node z is activatedc11Put into its active node set NA (z)c11) Node z is to bec11Putting the nodes into a node set V, and turning to the step (5-10), otherwise, turning to the step (5-11);
(5-10) judging the node zc11Whether it is the first set of initial nodes SCIf d is an element in (1)CAs currDist, node zc11Joining node SetCAnd (5) turning to the step (5-11), otherwise, turning to the step (5-11);
(5-11) judging the node zc11Whether it is an element in the second set of initial nodes S,if so, dSAs currDist, node zc11Joining node SetSAnd (5) turning to the step (5-12), otherwise, turning to the step (5-12);
(5-12) entering a step (5-8) when the counter c11 is c11+ 1;
(5-13) judgment of dCWhether the current value is equal to curDist or not, if yes, the step (5-14) is carried out, and otherwise, the step (5-15) is carried out;
(5-14) judgment of dSWhether the current sampling rate is equal to curDist or not, if so, the sampling of the second initial node set S at this time is adopted
Figure FDA0002402710200000091
Wherein
Figure FDA0002402710200000092
Is the total number of times node z is activated by the competing influence, InfsetC(u) is the Set of nodesCAnd (5) collecting competition influence propagated by the middle node, and turning to the step (5-30), otherwise, adopting A for the sampling of the second initial node set S at the timec10Entering step (5-30) when the value is 0;
(5-15) judgment of dSWhether the current sampling rate is equal to curDist or not, if so, the sampling A of the second initial node set S at this time is adoptedc10If the value is 1, the step (5-30) is carried out, otherwise, the step (5-16) is carried out;
(5-16) if the node set V is not an empty set, and dSIs not equal to curDist, and dCIf the sum is not equal to curDist, putting all the nodes in the node set V into the node set R, and turning to the step (5-17), otherwise, turning to the step (5-27);
(5-17) setting counter c12 ═ 1, current distance curDist from node z ═ curDist + 1;
(5-18) judging whether the value of c12 is larger than the total number of nodes in the node set V, if so, entering the step (5-26), otherwise, entering the step (5-19);
(5-19) selecting a c12 th node z from the node set Vc12And sets counter c13 to 1;
(5-20) judging whether the value of c13 is larger than the node zc12The total number of in-neighbor nodes of the network, if so,then, the step (5-25) is carried out, otherwise, the step (5-21) is carried out;
(5-21) judging node zc13Whether it is not in the node set R and according to node zc12C13 th neighbor node zc13Active node zc12Is determined by the probability of node zc13Active node zc12If successful, node z is assignedc12Is NA (w)c2) Element of (5) is put into node zc13Activation node set NA (w)c3) In (1), node zc13Put into node set VnextAnd (5) turning to the step (5-22), otherwise, turning to the step (5-24);
(5-22) judging the node zc13Whether it is the first set of initial nodes SCIf d is an element in (1)CAs currDist, node zc13Joining node SetCThen, the step (5-23) is carried out, otherwise, the step (5-23) is directly carried out;
(5-23) judging the node zc13Whether it is an element in the second set of initial nodes S, and if so, dSAs currDist, node zc13Joining node SetSThen, the step (5-24) is carried out, otherwise, the step (5-24) is directly carried out;
(5-24) the counter c13 ═ c13+1, and return to step (5-20);
(5-25) the counter c12 ═ c12+1, and return to step (5-18);
(5-26) node set VnextAll the nodes in the node set V are replaced by all the nodes in the node set V, and the step (5-16) is returned;
(5-27) judgment of dCWhether the current value is equal to curDist, if yes, the step (5-28) is carried out, otherwise, the step (5-29) is carried out;
(5-28) judgment of dSWhether the current sampling rate is equal to curDist or not, if so, the sampling of the second initial node set S at this time is adopted
Figure FDA0002402710200000101
Then, the process proceeds to step (5-30), wherein
Figure FDA0002402710200000102
The size of the union set of the activated node sets of the nodes in the set S, otherwise the sampling of the second initial node set S is performed by adopting Ac10And then entering the step (5-30);
(5-29) judgment of dSWhether the current sampling rate is equal to curDist or not, if so, the sampling A of the second initial node set S at this time is adoptedc10Turning to the step (5-30) if the value is 1, otherwise, directly turning to the step (5-30);
(5-30) returning to the sampling Ac10
9. An initial node election system that achieves impact maximization, comprising:
a first module, configured to construct an influence graph according to the established propagation model, and generate a first initial node set S according to the constructed influence graphCFor propagating other competitive influences;
a second module for obtaining the threshold T by SSA algorithm according to the preset precision parameters epsilon and delta0Performing T on the influence graph obtained by the first module0Sampling is adopted in the second reverse direction, and T obtained by all sampling is used0Putting a sample into a sample set
Figure FDA0002402710200000111
Performing the following steps; wherein the threshold value T0Is calculated by the formula
Figure FDA0002402710200000112
Wherein the function
Figure FDA0002402710200000113
Figure FDA0002402710200000114
n represents the total number of nodes in the impact graph;
a third module for returning the sample set according to the second module
Figure FDA0002402710200000115
Iteratively selecting k nodes with the largest edge profit from the influence graph as initial node sets, and obtaining adopted biased estimates of a second initial node set S
Figure FDA0002402710200000116
Wherein 0<k<n;
A fourth module, configured to set the counter c10 to be 0, and set the SUM total used for the second initial node set S obtained by the third module2=0;
A fifth module, configured to perform inverse sampling on the influence graph, and calculate an adoption a obtained in this sampling by the second initial node set S obtained by the third modulec10And updating the SUM total adopted by the second initial node set S obtained by the third module2=SUM2+Ac10
A sixth module for determining whether counter c10 is less than threshold T2And SUM of total adoption of the second set of initial nodes S2Whether or not less than threshold value T3If yes, c10 is set to c10+1, and the method returns to the fifth module, otherwise, the adopted unbiased estimation of the second initial node set S is output
Figure FDA0002402710200000117
Then entering a seventh module, wherein n represents the total number of nodes in the impact graph;
a seventh module for determining a biased estimation of the second initial node set S obtained by the third module
Figure FDA0002402710200000118
Unbiased estimation with the second set of initial nodes S obtained by the sixth module
Figure FDA0002402710200000119
Whether or not to satisfy
Figure FDA0002402710200000121
If so, the second set of initial nodes S is directly output as a result, the process ends,otherwise, updating the sampling times T used by the second module0=2*T0And returning to the second module, where ε1The accuracy parameter is represented.
CN201910448351.1A 2019-05-28 2019-05-28 Initial node selection method and system for realizing influence maximization Active CN110138619B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910448351.1A CN110138619B (en) 2019-05-28 2019-05-28 Initial node selection method and system for realizing influence maximization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910448351.1A CN110138619B (en) 2019-05-28 2019-05-28 Initial node selection method and system for realizing influence maximization

Publications (2)

Publication Number Publication Date
CN110138619A CN110138619A (en) 2019-08-16
CN110138619B true CN110138619B (en) 2020-05-19

Family

ID=67582167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910448351.1A Active CN110138619B (en) 2019-05-28 2019-05-28 Initial node selection method and system for realizing influence maximization

Country Status (1)

Country Link
CN (1) CN110138619B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780073A (en) * 2017-01-11 2017-05-31 中南大学 A kind of community network maximizing influence start node choosing method for considering user behavior and emotion
CN108122168A (en) * 2016-11-28 2018-06-05 中国科学技术大学先进技术研究院 Seed node screening technique and device in social activity network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104134159B (en) * 2014-08-04 2017-10-24 中国科学院软件研究所 A kind of method that spread scope is maximized based on stochastic model information of forecasting
CN109727152B (en) * 2019-01-29 2020-07-17 重庆理工大学 Online social network information propagation construction method based on time-varying damping motion

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108122168A (en) * 2016-11-28 2018-06-05 中国科学技术大学先进技术研究院 Seed node screening technique and device in social activity network
CN106780073A (en) * 2017-01-11 2017-05-31 中南大学 A kind of community network maximizing influence start node choosing method for considering user behavior and emotion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Scalable influence maximization in social networks under the linear threshold model;Wei Chen et al.;《Proceedings of the 2010 IEEE International Conference on Data Mining》;20101217;88-97页 *
基于LT模型的个性化关键传播用户挖掘;郭静等;《计算机学报》;20140430;809-818页 *

Also Published As

Publication number Publication date
CN110138619A (en) 2019-08-16

Similar Documents

Publication Publication Date Title
Zeng et al. A comprehensive survey of incentive mechanism for federated learning
Wuebker et al. The strength of strong ties in an emerging industry: Experimental evidence of the effects of status hierarchies and personal ties in venture capitalist decision making
Zegras et al. Scenario planning for strategic regional transportation planning
Zhang et al. Dynamics in the European air transport network, 2003–9: an explanatory framework drawing on stochastic actor-based modeling
CN115270782A (en) Event propagation popularity prediction method based on graph neural network
CN115392337A (en) Reinforced learning mobile crowdsourcing incentive method based on user reputation
CN110138619B (en) Initial node selection method and system for realizing influence maximization
De Clerck Public-private partnership procurement: Game-theoretic studies of the tender process
Shi et al. Social sourcing: Incorporating social networks into crowdsourcing contest design
Tran Modeling sustainability transitions on complex networks
CN109544261A (en) A kind of intelligent perception motivational techniques based on diffusion and the quality of data
Niknami et al. A budged framework to model a multi-round competitive influence maximization problem
US20090144210A1 (en) Method and apparatus for determining the variable dependency
Olcay et al. Who should really get government support: an analysis of Turkish SME cases
CN112417304B (en) Data analysis service recommendation method and system for constructing data analysis flow
Zhang et al. Collective ratings for online communities with strategic users
Xu et al. Can early joining participants contribute more?-timeliness sensitive incentivization for crowdsensing
Cho et al. MECHANISMS FOR FACILITATING COMMUNITY PARTICIPATION IN SINGAPORE'S NEIGHBORHOOD-PLANNING FRAMEWORK
Crippa et al. Equilibria in repeated games under no-regret with dynamic benchmarks
Guo Probabilistic forecasting in decision-making: new methods and applications
Willemse The moderating effect of mentorship on enterprise development in South Africa
Zhang et al. Unequal Opportunities in Multi-hop Referral Programs
CN114691938B (en) Node influence maximization method based on hypergraph
Jøsang et al. Bayesian reputation systems
CN117172286B (en) Multilayer fusion network key node identification method based on improved structure hole

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant