CN110138619B

CN110138619B - Initial node selection method and system for realizing influence maximization

Info

Publication number: CN110138619B
Application number: CN201910448351.1A
Authority: CN
Inventors: 周旭; 刘勇刚; 姜文君; 肖国庆; 罗文晟; 李肯立; 李克勤
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2019-05-28
Filing date: 2019-05-28
Publication date: 2020-05-19
Anticipated expiration: 2039-05-28
Also published as: CN110138619A

Abstract

The invention discloses an initial node selection method for realizing influence maximization, which aims at a scene that multiple influences are simultaneously transmitted in a social network, introduces a crowd consciousness into a transmission process, and provides a reverse sampling method, an initial node selection method and an initial node estimation method aiming at a transmission model of the crowd consciousness. The propagation process is modeled more scientifically and truly from the propagation model of the public consciousness, the initial node selection method can accurately and efficiently select the initial node, can adapt to a large-scale network structure, and improves the timeliness of the initial node selection method.

Description

Initial node selection method and system for realizing influence maximization

Technical Field

The invention belongs to the technical field of computer information, and particularly relates to an initial node selection method and system for realizing influence maximization.

Background

The development of the internet not only provides convenient life for human beings, but also changes the life and working modes of human beings. With the rise of network applications such as Facebook and microblog and the popularization of mobile network terminals, people who are scattered in different regions, have different beliefs and belong to different countries and organizations are connected together in an online social network to form a huge information traffic network. Information dissemination of a large-scale social network implies huge economic and social values (for example, the information dissemination can be used for advertisement marketing and policy promotion), so how to select an initial node to achieve Influence maximization (IM for short) has become an important research problem in the technical field of social networks.

The existing initial node selection method is mainly based on an Independent Cascade (IC) model or a Linear Threshold (LT) model, and determines the influence of an initial node by using a method based on simulation, a heuristic method or a method of reverse influence sampling. However, the models used by the existing initial node selection methods still have non-negligible technical problems: because the behavior of people is not considered, the established model is not real enough, so that the obtained influence value is not accurate enough, and finally the selected initial node is not ideal enough.

Disclosure of Invention

Aiming at the defects or improvement requirements in the prior art, the invention provides an initial node selection method and system for realizing influence maximization, aiming at solving the technical problems that the acquired influence value is not accurate enough and the selected initial node is not ideal enough due to the fact that the model adopted by the conventional initial node selection method does not consider the behaviors of the people; in addition, the propagation model provided by the invention enhances the authenticity and scientificity of the propagation model and also improves the application range of the propagation model; finally, the initial node selection method provided by the invention improves the timeliness of the initial node selection method.

To achieve the above object, according to one aspect of the present invention, there is provided an initial node selection method for maximizing influence, including the steps of:

(1) constructing an influence graph according to the established propagation model, and generating a first initial node set S according to the constructed influence graph_CFor propagating other competitive influences;

(2) according to preset precision parameters epsilon and delta, utilizing SSA Algorithm acquisition threshold T₀Performing T on the influence map obtained in (1)₀Sampling is adopted in the second reverse direction, and T obtained by all sampling is used₀Putting a sample into a sample set

Performing the following steps;

(3) according to the sample set returned in the step (2)

Iteratively selecting k nodes with the largest edge profit from the influence graph as initial node sets, and obtaining adopted biased estimates of a second initial node set S

Wherein 0<k<n；

(4) Setting a counter c10 equal to 0, which indicates the number of times of performing the inverse sampling of step (5), and setting the SUM of total adoption of the second initial node set S obtained in step (3)₂＝0；

(5) Sampling the influence graph reversely, and calculating the adoption A obtained in the sampling of the second initial node set S obtained in the step (3) in the current sampling_c10And updating the total adopted SUM of the second initial node set S obtained in the step (3)₂＝SUM₂+A_c10；

(6) Determine whether counter c10 is less than threshold T₂And SUM of total adoption of the second set of initial nodes S₂Whether or not less than threshold value T₃If yes, c10 is set to c10+1, and the step (5) is returned, otherwise, the output is output

And entering step (7);

(7) judging the biased estimation of the second initial node set S obtained in the step (3)

Unbiased estimation with the second set of initial nodes S obtained in step (6)

Whether or not to satisfy

If yes, directly outputting a second initial node set S as a result, ending the process, otherwise updating the sampling times T used in the step (2)₀＝2*T₀And returning to the step (2).

Preferably, the influence graph is represented as G ═ V, (E, p), where V represents the set of nodes, E represents the set of directed edges, p represents a function of the pre-assigned activation probabilities of all edges, each edge having a pre-assigned activation probability p (u, V) E [0,1], representing the probability that node u activates node V.

Preferably, the propagation process of the propagation model is as follows:

first, at time 0, the different influences activate the respective initial nodes to start the propagation process. Different influences can select the same node as an initial node;

then, at time t, the node that has been activated at time t-1 becomes activated at this time and attempts to activate the neighbor nodes that are not activated, if the activation attempt is successful, the activated node will receive all the effects of the activated node, if the activation attempt is unsuccessful, the nodes that are not activated remain in the inactivated state, after which the node in the activated state will adopt one of its received effects according to the probability defined by the adoption function of the slave and finally becomes the adopted state;

the probability is defined by a function h (u, I) from numerous as follows:

wherein, h (u, I)_i) Representing node u adoption impact I_iIs the set of all influence components, NA (u, I)_i) Indicating that node u is activated and to which impact I is propagated_iOf the neighboring node.

Finally, the propagation process terminates when no new nodes are activated.

The sample in the step (2) is data obtained by sampling in a single reverse direction, and includes a node set R, a set { d (u) | u ∈ R } of a distance between each node w and other nodes in the node set R, a set { NA (u) | u ∈ R } of an activated node set of each node in the node set R, a node w, and a first initial node set S_CDistance d of_CAnd node w is set by the first initial node S_CTotal number of activation times Time of the intermediate nodes_C。

Preferably, the process of single inverse sampling in step (2) comprises the following sub-steps:

(2-1) randomly selecting a node w from the influence graph, and setting the distance d (w) between the node w and the node w_c1) Put w into node set R and set initial node set S as 0_CDistance d from node w_CSetting a variable curDist to be 0, wherein curDist represents the distance between the currently processed node and the node w, and is called the current distance for short;

(2-2) judging whether the node w is the first initial node set S_CIf yes, recording the total Time of activation of node w by competition influence_CAnd an initial set of nodes S_CDistance d from node w_cIf not, the step (2-17) is carried out, otherwise, the step (2-3) is carried out;

(2-3) setting a counter c1 to be 1, judging whether the current distance curDist is curDist +1, judging whether the value c1 is larger than the total number of the neighbor nodes of the node w, if so, turning to the step (2-17), otherwise, turning to the step (2-4);

(2-4) according to the c1 th entry neighbor node w of the node w_c1Probability decision node w of activation node w_c1Whether the node w is successfully activated or not is judged, and if yes, the node w is recorded_c1Distance d (w) from node w_c1) As curDist, node w_c1Put into its active node set NA (w)_c1) Node w_c1Putting the nodes into a node set V, and turning to the step (2-5), otherwise, turning to the step (2-6);

(2-5) judging the node w_c1Whether it is the first set of initial nodes S_CIf it is an element ofThen the initial node is assembled S_CDistance d from node w_CSetting the current distance curDist, and turning to the step (2-6), otherwise, turning to the step (2-6);

(2-6) judging whether the value of c1 is equal to the total number of the neighbor nodes of the node w, if so, switching to the step (2-8), otherwise, switching to the step (2-7);

(2-7) setting the counter c1 ═ c1+1, and returning to step (2-4);

(2-8) judging the node set V and the first initial node set S_CWhether an intersection exists or whether the node set V is an empty set, if so, calculating and recording the node w by the node set V and the node set S_CTotal Time of activation affected by competition propagated by nodes in the intersection_CAnd go to step (2-17), otherwise put all nodes in the node set V into the node set R, and go to step (2-9);

(2-9) setting counter c2 ═ 1, and current distance curDist ═ curDist + 1;

(2-10) selecting the c2 th node w from the node set V_c2And setting the counter c3 to 1, and judging whether the value of the counter c3 is larger than the node w_c2If yes, then step (2-14) is carried out, otherwise step (2-11) is carried out;

(2-11) judging the node w_c3Whether it is not in the node set R and according to the node w_c2C3 th neighbor node w_c3Activation node w_c2Is determined by the probability of node w_c3Activation node w_c2If successful, set node w_c3Distance d (w) from node w_c3) Equal to curDist, node w_c2Is NA (w)_c2) Element of (5) is put into node w_c3Activation node set NA (w)_c3) In (1), node w_c3Put into node set V_nextAnd (5) turning to the step (2-12), otherwise, turning to the step (2-13);

(2-12) judging the node w_c3Whether it is the first set of initial nodes S_CIf yes, setting an initial node set S_CDistance d from node w_CEqual to curDist and go to step (2-13), otherwiseDirectly transferring to the step (2-13);

(2-13) judging whether the value of the counter c3 is larger than the node w_c2If yes, the step (2-14) is carried out, otherwise, a counter c3 is set to be c3+1, and the step (2-11) is returned;

(2-14) judging whether the value of the counter c2 is equal to the total number of the nodes in the node set V, if so, entering the step (2-15), otherwise, setting the counter c2 to c2+1, and returning to the step (2-10);

(2-15) judging the node set V_nextWith a first set of initial nodes S_CWhether there is an intersection, or a set of nodes V_nextWhether the node is an empty set or not, if so, acquiring a node w and collecting a node set V_nextAnd S_CTotal Time of activation affected by competition propagated by nodes in the intersection_CAnd then the step (2-17) is carried out, otherwise, the step (2-16) is carried out;

(2-16) node set V_nextReplacing all nodes in the node set V by all nodes in the node set V, putting all nodes in the node set V into the node set R, and returning to the step (2-8);

(2-17) acquiring a node set R, a set { d (u) | u ∈ R } of the distance between each node in the node set R and the node w, a set { NA (u) | u ∈ R } of an activated node set of each node in the node set R, the node w and a first initial node set S_CDistance d of_CAnd node w is set by the first initial node set S_CTotal number of activation times Time of the intermediate nodes_C。

Preferably, the step (3) of iteratively selecting k nodes with the largest edge profit as the initial node set includes the following substeps:

(3-1) setting

Setting the counter c4 to 1, and setting each node in the influence graph in the sample set

All the initial values adopted in (1) are 0, and the second initial node set S is in the sampleCollection

SUM of adopted SUM of above₁＝0；

(3-2) setting counter c5 ═ 1;

(3-3) from the sample set

Taking out the c5 th sample R_c5Setting counter c6 to 1;

(3-4) from the sample R_c5Take out the c6 th node w_c6Is judged as R_c5Middle node w_c6Distance d (w) from node w_c6) Whether or not less than d_CIf yes, set node w_c6At R_c5The above is adopted as 1, node w_c6 Adding 1 to the total adoption of the method, and then turning to the step (3-5); otherwise, set node w_c6At R_c5The adoption of is

Updating w_c6In a sample set

Is the node w_c6In a sample set

Total adoption of (1) plus

Then, the step (3-5) is carried out; wherein NA (w)_c6) And Time_CAre respectively representative of samples R_c5Active node set and first initial node set S in (1)_CImpact activation samples R propagated by middle nodes_c5The total number of nodes w in (d).

(3-5) judging whether the counter c6 is smaller than | R_c5If yes, setting c6 to c6+1, then returning to the step (3-4), otherwise, turning to the step (3-6);

(3-6) judging whether or not the counter c5 is smaller than

If yes, c5 is set to c5+1, the step (3-3) is returned, and otherwise, the step (3-7) is carried out;

(3-7) utilizing the large root heap to influence all nodes in the graph according to each node

The total adopted sizes of the above are arranged in descending order;

(3-8) judgment counter C₄Whether the k is less than a preset threshold value k, if so, taking out the k

Node w of which the total number of nodes is the maximum_c4Node w_c4Adding a second initial node set S, setting SUM₁＝SUM₁+A(w_c4) Wherein A (w)_c4) Is node w_c4In that

And remove node w from the big root heap_c4And going to step (3-9), otherwise obtaining the initial node set S and the biased estimation adopted by the initial node set S

Then, the process goes to the step (3-22).

(3-9) from the sample set

Take out all contained nodes w_c4Forming a sample set

And sets counter c7 to 1;

(3-10) taking out

C7 th sample R in (1)_c7And judging the sample R_c7Whether or not toIs marked as determined adopted, if yes, the step (3-21) is carried out, otherwise, the step (3-11) is carried out;

(3-11) determination of w_c4And sample R_c7Distance d (w) of node w_c4) Whether or not less than R_c7Initial node set S in (1)_CDistance d from node w_CIf so, the sample R is marked_c7If the determination is adopted, the step (3-12) is carried out, otherwise, the step (3-15) is carried out;

(3-12) setting counter c8 to 1;

(3-13) taking out sample R_c7C8 th node in (1), update w_c8In a sample set

Is the node w_c8In a sample set

Total sum of above minus node w_c8At R_c7And update the node w_c8Sorting in a large heap.

(3-14) judging whether or not the counter c8 is smaller than | R_c7If yes, setting c8 to c8+1, and going to the step (3-13), otherwise, going to the step (3-21);

(3-15) setting counter c9 to 1;

(3-16) taking out sample R_c7C9 th node w in_c9Judgment of w_c9And sample R_c7Distance d (w) of node w_c9) Whether or not less than R_c7Initial node set S in (1)_CDistance d from node w_CIf yes, going to the step (3-17), otherwise, going to the step (3-18);

(3-17) update node w_c9At R_c7Is adopted as a node w_c9At R_c7Node w is subtracted from the above_c4At R_c7By adopting the above method, the node w is updated_c9In a sample set

Is the node w_c9In a sample set

Total sum of above minus node w_c4At R_c7The above steps are adopted;

(3-18) updating node w_c9In a sample set

Is the node w_c9In a sample set

Total sum of above minus node w_c9At R_c7By adopting the above method, the node w is updated_c9At R_c7The adoption of is

Wherein

The size of the union of the active node sets for the nodes in set S, i.e., the second set of initial nodes S at R_c7Total number of activations on and updates node w again_c9In a sample set

Is the node w_c9In a sample set

Total adoption of (1) plus node w_c9At R_c7Adopting after updating;

(3-19) updating the ordering of node wc9 in the big root heap;

(3-20) judging whether the counter c9 is less than | R_c7If yes, setting c9 to c9+1, and going to the step (3-16), otherwise, going to the step (3-21);

(3-21) judging whether or not the counter c7 is smaller than

If yes, setting c7 to c7+1, and going to the step (3-10), otherwise, going to the step (3-8);

(3-22) judgment of SUM₁Whether or not it is greater than threshold value T₁If yes, entering the step (4), otherwise updating the sampling times T used in the step (2)₀＝2*T₀And returns to (2). According to the SSA algorithm, the threshold T₁Is calculated by the formula

Wherein

ε₂＝ε₃＝(1-1/e)^-1ε/2。

Preferably, the adoption f (S) of the second initial node set S is obtained by adopting the following formula:

wherein S is an initial node set, n is the total number of nodes in the network, A_iIs the revenue of the sampling at the ith time,

is the number of actual samples and is,

is that

Average of all samples taken.

Preferably, the influence graph is subjected to inverse sampling in step (5), and the sampling a obtained in this sampling by the second initial node set S obtained in step (3) is calculated_c10Comprises the following substeps:

(5-1) randomly selecting a node z from the influence graph, placing the node z in a node set R, a first initial node set S_CDistance d from node z_CInfinity, of a second set S of initial nodes and node zDistance d_SInfinity, the current distance curDist is 0, and the sampling of the second initial node set S at this time is a_c10＝0；

(5-2) determining whether the node z is the first initial node set S_CIf d is an element in (1)_CIf so, switching to the step (5-3), otherwise, switching to the step (5-3);

(5-3) judging whether the node z is an element in the second initial node set S, if so, d_SIf so, switching to the step (5-4), otherwise, switching to the step (5-4);

(5-4) judgment of d_CWhether the current value is equal to curDist or not, if so, turning to the step (5-5), otherwise, entering the step (5-6);

(5-5) judgment of d_SWhether the current sampling rate is equal to curDist or not, if so, the sampling of the second initial node set S at this time is adopted

Wherein Time_CThe total number of times that the node z is activated by the influence of competition is calculated, and the step (5-30) is carried out, otherwise, the sampling A of the second initial node set S at the time is adopted_c10Entering step (5-30) when the value is 0;

(5-6) judging whether the current sampling rate is equal to curDist, if so, adopting A for the current sampling of the second initial node set S_c10If the value is 1, the step (5-30) is carried out, otherwise, the step (5-7) is carried out;

(5-7) setting counter c11 ═ 1, current distance curDist ═ curDist + 1;

(5-8) judging whether the value of c11 is larger than the total number of the neighbor nodes of the node z, if so, turning to the step (5-13), otherwise, turning to the step (5-9);

(5-9) the c11 th entry neighbor node z according to the node z_c11Probabilistic decision node z of activation node z_c11If the activation of the node z is successful, if so, the node z is activated_c11Put into its active node set NA (z)_c11) Node z is to be_c11Putting the nodes into a node set V, and turning to the step (5-10), otherwise, turning to the step (5-11);

(5-10) judging the node z_c11Whether it is the first set of initial nodes S_CElement (1) ofIf yes, d_CAs currDist, node z_c11Joining node Set_CAnd (5) turning to the step (5-11), otherwise, turning to the step (5-11);

(5-11) judging the node z_c11Whether it is an element in the second set of initial nodes S, and if so, d_SAs currDist, node z_c11Joining node Set_SAnd (5) turning to the step (5-12), otherwise, turning to the step (5-12);

(5-12) entering a step (5-8) when the counter c11 is c11+ 1;

(5-13) judgment of d_CWhether the current value is equal to curDist or not, if yes, the step (5-14) is carried out, and otherwise, the step (5-15) is carried out;

(5-14) judgment of d_SWhether the current sampling rate is equal to curDist or not, if so, the sampling of the second initial node set S at this time is adopted

Wherein

Is the total number of times node z is activated by the competing influence, Infset_C(u) is the Set of nodes_CAnd (5) collecting competition influence propagated by the middle node, and turning to the step (5-30), otherwise, adopting A for the sampling of the second initial node set S at the time_c10Entering step (5-30) when the value is 0;

(5-15) judgment of d_SWhether the current sampling rate is equal to curDist or not, if so, the sampling A of the second initial node set S at this time is adopted_c10If the value is 1, the step (5-30) is carried out, otherwise, the step (5-16) is carried out;

(5-16) if the node set V is not an empty set, and d_SIs not equal to curDist, and d_CIf the sum is not equal to curDist, putting all the nodes in the node set V into the node set R, and turning to the step (5-17), otherwise, turning to the step (5-27);

(5-17) setting counter c12 ═ 1, current distance curDist from node z ═ curDist + 1;

(5-18) judging whether the value of c12 is larger than the total number of nodes in the node set V, if so, entering the step (5-26), otherwise, entering the step (5-19);

(5-19) selecting a c12 th node z from the node set V_c12And sets counter c13 to 1;

(5-20) judging whether the value of c13 is larger than the node z_c12If yes, then step (5-25), otherwise step (5-21);

(5-21) judging node z_c13Whether it is not in the node set R and according to node z_c12C13 th neighbor node z_c13Active node z_c12Is determined by the probability of node z_c13Active node z_c12If successful, node z is assigned_c12Is NA (w)_c2) Element of (5) is put into node z_c13Activation node set NA (w)_c3) In (1), node z_c13Put into node set V_nextAnd (5) turning to the step (5-22), otherwise, turning to the step (5-24);

(5-22) judging the node z_c13Whether it is the first set of initial nodes S_CIf d is an element in (1)_CAs currDist, node z_c13Joining node Set_CThen, the step (5-23) is carried out, otherwise, the step (5-23) is directly carried out;

(5-23) judging the node z_c13Whether it is an element in the second set of initial nodes S, and if so, d_SAs currDist, node z_c13Joining node Set_SThen, the step (5-24) is carried out, otherwise, the step (5-24) is directly carried out;

(5-24) the counter c13 ═ c13+1, and return to step (5-20);

(5-25) the counter c12 ═ c12+1, and return to step (5-18);

(5-26) node set V_nextAll the nodes in the node set V are replaced by all the nodes in the node set V, and the step (5-16) is returned;

(5-27) judgment of d_CWhether the current value is equal to curDist, if yes, the step (5-28) is carried out, otherwise, the step (5-29) is carried out;

(5-28) judgment of d_SWhether the current sampling rate is equal to curDist or not, if so, the sampling of the second initial node set S at this time is adopted

Then, the process proceeds to step (5-30), wherein

The size of the union set of the activated node sets of the nodes in the set S, otherwise the sampling of the second initial node set S is performed by adopting A_c10And then entering the step (5-30);

(5-29) judging whether the current sampling is equal to curDist, if so, adopting A for the current sampling by the second initial node set S_c10Turning to the step (5-30) if the value is 1, otherwise, directly turning to the step (5-30);

(5-30) returning to the sampling A_c10。

According to another aspect of the present invention, there is provided an initial node selection system for maximizing influence, including:

a first module, configured to construct an influence graph according to the established propagation model, and generate a first initial node set S according to the constructed influence graph_CFor propagating other competitive influences;

a second module for obtaining the threshold T by SSA algorithm according to the preset precision parameters epsilon and delta₀Performing T on the influence graph obtained by the first module₀Sampling is adopted in the second reverse direction, and T obtained by all sampling is used₀Putting a sample into a sample set

Performing the following steps;

a third module for returning the sample set according to the second module

Wherein 0<k<n；

A fourth module, setting the counter c10 to 0, for indicating the number of times of performing the inverse sampling in step (5), and setting the SUM of total adoption of the second initial node set S obtained by the third module₂＝0；

A fifth module, configured to perform inverse sampling on the influence graph, and calculate an adoption a obtained in this sampling by the second initial node set S obtained by the third module_c10And updating the SUM total adopted by the second initial node set S obtained by the third module₂＝SUM₂+A_c10；

A sixth module for determining whether counter c10 is less than threshold T₂And SUM of total adoption of the second set of initial nodes S₂Whether or not less than threshold value T₃If yes, c10 is set to c10+1, and the method returns to the fifth module, otherwise, adopted biased estimation of the second initial node set S is output

Then entering a seventh module;

a seventh module for determining a biased estimation of the second initial node set S obtained by the third module

Unbiased estimation with the second set of initial nodes S obtained by the sixth module

Whether or not to satisfy

If yes, directly outputting a second initial node set S as a result, ending the process, otherwise, updating the sampling times T used by the second module₀＝2*T₀And returns to the second module.

In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:

(1) the crowd-sourced propagation model provided by the invention considers the competitive relationship of various influences on propagation, and introduces the crowd behavior into the process of adopting the influence for the first time as a decision basis, so that the rationality and the scientificity of the propagation model are enhanced, the obtained influence value is accurate, and the finally selected initial node is ideal enough;

(2) the invention aims at the reverse sampling method of the propagation model design of people consciousness, and solves the problem that the conventional reverse sampling method is not suitable for multi-influence sampling, thereby ensuring that the application range of the reverse sampling method is wider;

(3) the initial node selection method provided by the invention incrementally updates the adoption income of the nodes, thereby reducing a large amount of repeated calculation and improving the calculation efficiency.

Drawings

FIG. 1 is a flow chart of an initial node selection method of the present invention for impact maximization;

FIG. 2 is a state transition diagram of nodes in a crowd-aware propagation model;

FIG. 3 illustrates an example of a crowd-aware propagation model of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Aiming at the defects mentioned in the background technology, the invention firstly provides a crowd-aware propagation model, aiming at the propagation design of a plurality of competitive influences, the crowd behavior of people is introduced into the propagation process for selecting the finally adopted influence; then, based on a propagation model of the awareness of the people, the adoption maximization problem of the people is put forward; and finally, a sampling method and an initial node selection method are reversely adopted for the propagation model design of the people awareness, and an initial node set with quality assurance is calculated by combining an SSA framework.

The technical terms appearing in the present invention are explained and illustrated in detail below:

the method comprises the following steps: the term "public behavior" refers to the behavior of an individual influenced by the behavior of the external population, and the behavior of the individual in the perception, judgment and cognition is consistent with public opinion or most people.

The method comprises the following steps: refers to the expectation of the number of nodes that adopt some influence in the social network or the number of nodes that adopt some influence.

The adopted function of the following people: probability functions defined in a propagation model for multiple scenarios with competitive impact propagation in a social network at the same time

The adopted probability function of the subordinate people satisfies: h (u, I) is more than or equal to 0_i) Less than or equal to 1 and

the probability parameter is provided for selecting and adopting one influence for the nodes in the social network, and the probability of adopting the influence is higher when the number of times the node receives the influence is larger, so that the behavior mode that the selection of the node is obeyed from people is reflected.

Entering a neighbor node: information propagation between nodes in a social network has directionality, and thus, edges between nodes in the influence graph also have directions. For any node v, if there is a directed edge (u, v) indicating that the node u points to the node v, then the node u is an in-neighbor node of the node v.

And (3) outputting a neighbor node: similarly, for any node u, if there is a directed edge (u, v) indicating that the node u points to the node v, then the node v is an out-neighbor node of the node u.

A crowd-aware propagation model: which specifies propagation rules for various influences in the influence graph. There are two behaviors per node: activation and adoption, the main controls of these two actions are the activation probability function p (e) and the adoption probability function h (u, I) of the subordinate.

The activation probability function p (e) is as in the present inventionThe method comprises three setting modes, namely unified probability setting, in-degree reciprocal weighting setting and one-out-of-three setting. The unified probability setting mode is that each edge in the network sets the same activation probability, which is usually 0.1. Setting the weighting of the inverse degree of the activation, and setting the activation probability of the edge as the inverse degree of the end point of the edge, namely p (u, v) is 1/| N_in(v) Where the edge e ═ v, v is the end point of the edge, N_in(v) Are all in-neighbors that point to the edge of v. The setting of one out of three is to randomly select an activation probability set as an edge from three values 0.1,0.01, 0.001.

The probability function adopted by the slave is defined as h (u, I) as follows:

wherein, h (u, I)_i) Indicating that node u adopts I_iProbability of (1), Freq (I)_i) Indicating u received influence I_iThe number of times.

The following formula is used in the actual calculation:

NA(u,I_i) Indicates that u is activated and I is propagated thereto_iInto the neighbours.

The adopted probability function of the subordinate people satisfies: h (u, I) is more than or equal to 0_i) Less than or equal to 1 and sigma_1≤i≤|I|h(u,I_i)＝1。

As shown in fig. 1, the initial node selection method for maximizing influence according to the present invention includes the following steps:

(1) constructing an Influence graph (infiluence graph) according to the established propagation model, and generating a first initial node set S according to the constructed Influence graph_CFor propagating other competing influences (i.e., competing influences);

the social network is modeled as a directed influence graph G ═ V, E, p, where V denotes a set of nodes, E denotes a set of directed edges, p denotes a function of the pre-assigned activation probabilities of all edges, each edge having a pre-assigned activation probability p (u, V) E [0,1], representing the probability that node u activates node V.

For a first set of initial nodes S_CIn other words, it is assumed that q is greater than or equal to 1 influence in the social network, each influence corresponds to one initial node set, and the initial node sets of different influences can overlap, that is, one node can serve as an initial node of multiple influences. First set of initial nodes S_CThe real initial node setting can be obtained from a real network, and the setting can also be artificially set through a heuristic method (such as setting according to the degree of departure of the node). For example, when the number q of the contention impact is 2, 50 initial nodes are set for each contention impact, the 25 nodes with the largest out-degree may be selected as the initial nodes common to the two contention impacts, and then the nodes with the largest out-degree are sequentially allocated to the two impacts, and each node is allocated with one impact.

Fig. 2 shows three state changes of a node in the propagation process of the propagation model, wherein the node is initially in an inactivated state, and changes to an activated state if successfully activated by a neighbor node, and the node still keeps in the inactivated state if not successfully activated by the neighbor node. When the activated node receives one or more node propagated effects, it will select and adopt one of the effects, its state will become adopted, and remain until the propagation process is finished.

The propagation process of the propagation model is further explained below in conjunction with fig. 3. In fig. 3, circles represent nodes, lines with arrows represent directional edges, and numbers in the circles represent the effects received by this node. The circle sector indicates the probability of adopting the corresponding effect, the bold arrowed line segment, indicates the edge that activation is being attempted, and the dashed line indicates activation failure.

1) At time 0, the different influences activate the respective initial nodes to start the propagation process. The different influences may select the same node as the initial node.

In fig. 3, at time t equal to 0, three nodes a, b and c are activated as initial nodes, a and c are activated by

influences

1 and 4, respectively, and node b is activated by three influences, namely influences 2, 3 and 4, respectively.

2) At time t, the node that has been activated at t-1 becomes activated at this point and attempts to activate the non-activated neighbor nodes, and if the activation attempt is successful, the activated node will receive all the effects of the activated node (the same non-activated node will likely be activated by multiple nodes in activated states, thus receiving multiple effects, and the same effect may also be received by the same node multiple times), and if the activation attempt is unsuccessful, the non-activated node remains in the non-activated state. After this, a node in the activated state will adopt one of its received influences according to the probability defined by the adopted function from the population, and finally become the adopted state.

Employment of function h (u, I) from numerous people_i) The defined probabilities are as follows:

wherein, h (u, I)_i) Representing node u adoption impact I_iIs the set of all influence components, Freq (I)_i) Indicating that node u receives an impact I_iThe number of times. The following formula is used in the actual calculation:

NA(u,I_i) Indicating that node u is activated and to which impact I is propagated_iOf the neighboring node.

The adopted probability function of the subordinate people satisfies: h (u, I) is more than or equal to 0_i) Less than or equal to 1 and sigma_1≤i≤|I|h(u,I_i) 1. In fig. 3, at time t equal to 1, three nodes a, b and c become activated, and attempt to activate three inactivated nodes d, e and f, respectively. Node c fails to activate e, node e is activated by both nodes a and b, receiving four effects of both nodes. Finally, the three nodes d, e and f will each select one of the impacts they receive. Wherein node e receives four influences, and each influence is received only once, so the probability of adoption is 1/4. In fig. 3, t is 2 because of the sectionPoint e is in the adopted state and node d cannot activate e. Node g is activated by three nodes at the same time, receiving four effects, but

effects

1 and 4 are received twice, the rest once. Therefore, the probability of using the

effects

1 and 4 is higher, 2/6, and the probability of using the remaining two effects is 1/6.

3) The propagation process terminates when no new nodes are activated.

In fig. 3, no new node can be activated after time t-2, and the propagation process ends.

(2) Obtaining a threshold value T by utilizing an SSA algorithm according to preset precision parameters epsilon and delta₀Performing T on the influence map obtained in (1)₀Sampling is adopted in the second reverse direction, and T obtained by all sampling is used₀Putting a sample into a sample set

Wherein, 0 < epsilon < 1,0 < delta < 1;

specifically, the threshold value T₀Is calculated by the formula

Wherein the function

n represents the total number of nodes in the impact graph,

in this step, the sample refers to data obtained by sampling in a single reverse direction, and specifically includes: the node set R, the set of distances between each node w and other nodes in the node set R { d (u) | u ∈ R }, the set of activated node sets of each node in the node set R { NA (u) | u ∈ R }, the node w and the first initial node set S }_CDistance d of_CAnd node w is set by the first initial node S_CTotal number of activation times Time of the intermediate nodes_C。

The process of sampling reversely adopted in the step comprises the following substeps:

(2-1) randomly selecting a node w from the influence graph, and setting the distance d (w) between the node w and the node w_c1) Put w into node set when equal to 0R, and setting an initial node set S_CDistance d from node w_CSetting a variable curDist to be 0, wherein curDist represents the distance between the currently processed node and the node w, and is called the current distance for short;

(2-5) judging the node w_c1Whether it is the first set of initial nodes S_CIf yes, the initial node is collected S_CDistance d from node w_CSetting the current distance curDist, and turning to the step (2-6), otherwise, turning to the step (2-6);

(2-7) setting the counter c1 ═ c1+1, and returning to step (2-4);

(2-8) judging the node set V and the first initial node set S_CWhether an intersection exists or whether the node set V is an empty set, if so, calculating and recording the node w by the node set V and the node set S_CTotal Time of activation affected by competition propagated by nodes in the intersection_CAnd the step (2-17) is carried out, otherwise, all the nodes in the node set V are put intoNode set R and the step (2-9) is carried out;

(2-9) setting counter c2 ═ 1, and current distance curDist ═ curDist + 1;

(2-12) judging the node w_c3Whether it is the first set of initial nodes S_CIf yes, setting an initial node set S_CDistance d from node w_CEqual to curDist, and the step (2-13) is carried out, otherwise, the step (2-13) is directly carried out;

(3) According to the sample set returned in the step (2)

Wherein 0<k<n；

Because the influence graph is a probability graph, activation and adoption are random, so the number of nodes adopting some kind of influence is also random. Thus, in probabilistic directed network propagation of multiple influences, the expectation is defined that the function f (S) be employed as the number of nodes employing some influence by the set of nodes S as the initial set of nodes. Since calculating the sampling f (S) of the second set of initial nodes S is an NP-Hard problem, using the sample estimates sampled in step (2) to estimate the sampling f (S), the calculation formula is as follows:

is the number of actual samples and is,

is that

Average of all samples taken.

The profit of the initial node can be measured more reasonably and scientifically by adopting the expectation of the number of the nodes.

In this step, the specific process of iteratively selecting k nodes with the maximum edge revenue from the impact graph as the initial node set is as follows:

(3-1) setting

All the initial values adopted in (1) are 0, and the second initial node set S is in the sample set

SUM of adopted SUM of above₁＝0；

(3-2) setting counter c5 ═ 1;

(3-3) from the sample set

Taking out the c5 th sample R_c5Setting counter c6 to 1;

(3-4) from the sample R_c5Take out the c6 th node w_c6Is judged as R_c5Middle node w_c6Distance d (w) from node w_c6) Whether or not less than d_CIf yes, set node w_c6At R_c5The above is adopted as 1, node w_c6Adding 1 to the total adoption of the method, and then turning to the step (3-5); otherwise, set node w_c6At R_c5The adoption of is

Updating w_c6In a sample set

Is the node w_c6In a sample set

Total adoption of (1) plus

(3-6) judging whether or not the counter c5 is smaller than

The total adopted sizes of the above are arranged in descending order;

Then, the process goes to the step (3-22).

(3-9) from the sample set

Take out all contained nodes w_c4Forming a sample set

And sets counter c7 to 1;

(3-10) taking out

C7 th sample R in (1)_c7And judging the sample R_c7Whether the mark is used for determining adoption, if yes, going to the step (3-21), otherwise, going to the step (3-11);

(3-12) setting counter c8 to 1;

(3-13) taking out sample R_c7C8 th node in (1), update w_c8In a sample set

Is the node w_c8In a sample set

(3-15) setting counter c9 to 1;

Is the node w_c9In a sample set

Total sum of above minus node w_c4At R_c7The above method is adopted.

(3-18) updating node w_c9In a sample set

Is the node w_c9In a sample set

Wherein

Is the node w_c9In a sample set

Total adoption of (1) plus node w_c9At R_c7And adopting after updating.

(3-19) update the ordering of node wc9 in the big root heap.

(3-21) judging whether or not the counter c7 is smaller than

Wherein

ε₂＝ε₃＝(1-1/e)^-1ε/2, the remaining variables are the same as in step (2).

Since the function f (S) is adopted, the method has nonnegativity, monotone nonreducibility and submodulity. Therefore, the sampling of S calculated using a greedy strategy

Wherein

Is in a sample set

Upper optimal set of initial nodes S⁺Sampling of (3).

In the above process, the samples are respectively set

All the nodes u in the sample R are adopted for the sample R calculation, and then the adoption of each node in different samples R is accumulated to obtain the sample set of each node in the influence graph

The above general formulae (3-1 to 3-6). Then according to the node sample set

The method comprises the steps of establishing a large root heap according to the total adoption of nodes in a sample set

The total adopted size is arranged in descending order, and the heap top is the node with the maximum total adopted value. And selecting the heap top node, namely adopting the maximum node as a new initial node. Due to the overlapping of the ranges of use between nodes, the use of nodes in the same sample R as the selected new initial node will be reduced, requiring the benefit of these nodes to be updated and the position of these nodes in the large root heap to be adjusted. And circularly selecting the heap top node and updating the profits of other nodes until the number of the nodes is k (3-7 to 3-21).

(5) Influence graphSampling is reversely adopted, and the adoption A obtained in the sampling of the second initial node set S obtained in the step (3) is calculated_c10And updating the total adopted SUM of the second initial node set S obtained in the step (3)₂＝SUM₂+A_c10。

In the step, sampling is reversely adopted for the influence graph, and an adoption A obtained in the current sampling of the second initial node set S obtained in the step (3) is calculated_c10Comprises the following substeps:

(5-1) randomly selecting a node z from the influence graph, placing the node z in a node set R, a first initial node set S_CDistance d from node z_CInfinity, distance d of the second set of initial nodes S from node z_SInfinity, the current distance curDist is 0, and the sampling of the second initial node set S at this time is a_c10＝0；

(5-7) setting counter c11 ═ 1, current distance curDist ═ curDist + 1;

(5-10) judging the node z_c11Whether it is the first set of initial nodes S_CIf d is an element in (1)_CAs currDist, node z_c11Joining node Set_CAnd (5) turning to the step (5-11), otherwise, turning to the step (5-11);

(5-12) entering a step (5-8) when the counter c11 is c11+ 1;

Wherein

(5-15) judgmentd_SWhether the current sampling rate is equal to curDist or not, if so, the sampling A of the second initial node set S at this time is adopted_c10If the value is 1, the step (5-30) is carried out, otherwise, the step (5-16) is carried out;

(5-24) the counter c13 ═ c13+1, and return to step (5-20);

(5-25) the counter c12 ═ c12+1, and return to step (5-18);

Then, the process proceeds to step (5-30), wherein

(5-30) returning to the sampling A_c10。

And entering step (7);

according to the SSA algorithm, the threshold T₂Is calculated by the formula

Threshold value T₃Is calculated by the formula

The variables in the formula are exactly the same as described in steps (2) and (3-22).

Unbiased estimation with the second set of initial nodes S obtained in step (6)

Whether or not to satisfy

If yes, directly outputting a second initial node set S as a result, ending the process, and otherwise, updating the sampling times T used in the step (2)₀＝2*T₀And returning to the step (2).

In summary, the crowd-sourced propagation model provided by the invention considers the competitive relationship of various influences on propagation, and introduces the crowd-sourced behavior into the influence adoption process for the first time as a decision basis, so that the rationality and the scientificity of the propagation model are enhanced. The sampling method is adopted in the reverse direction of the propagation model design of the people consciousness, the problem that the propagated reverse sampling method is not suitable for multi-influence sampling is solved, and the application range of the method is widened. The initial node selection method incrementally updates the adoption income of the nodes, thereby reducing a large amount of repeated calculation and improving the calculation efficiency.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. An initial node selection method for realizing influence maximization is characterized by comprising the following steps:

(2) obtaining a threshold value T by using a stop-check SSA algorithm according to preset precision parameters epsilon and delta₀Performing T on the influence map obtained in (1)₀Sampling is adopted in the second reverse direction, and T obtained by all sampling is used₀Putting a sample into a sample set

Performing the following steps; wherein the threshold value T₀Is calculated by the formula

Wherein the function

n represents the total number of nodes in the impact graph;

(3) according to the sample set returned in the step (2)

Iteratively selecting k nodes with the largest edge revenue from the influence graph as a second initial node set S, and obtaining adopted biased estimates of the second initial node set S

Wherein 0<k<n；

(4) Setting the counter c10 to 0, and setting the SUM total adopted for the second initial node set S obtained in step (3)₂＝0；

(5) Sampling the influence graph reversely, and calculating the adoption A obtained in the sampling of the second initial node set S obtained in the step (3) in the current sampling_c10And updating the value obtained in the step (3)SUM of total adoption of the second set of initial nodes S₂＝SUM₂+A_c10；

(6) Determine whether counter c10 is less than threshold T₂And SUM of total adoption of the second set of initial nodes S₂Whether or not less than threshold value T₃If yes, c10 is set to c10+1, the step (5) is returned, and otherwise, unbiased estimation is output

And entering step (7), wherein n represents the total number of nodes in the influence graph;

Unbiased estimation with the second set of initial nodes S obtained in step (6)

Whether or not to satisfy

If yes, directly outputting a second initial node set S as a result, ending the process, otherwise updating the sampling times T used in the step (2)₀＝2*T₀And returning to step (2) wherein epsilon₁Representing the weight.

2. The initial node selection method according to claim 1, wherein the influence graph is represented as G ═ V, E, p, where V denotes a set of nodes, E denotes a set of directed edges, p denotes a function of pre-assigned activation probabilities of all edges, each edge has a pre-assigned activation probability p (u, V) ∈ [0,1], and denotes a probability that node u activates node V.

3. The initial node selection method of claim 1, wherein the propagation process of the propagation model is as follows:

firstly, at the time 0, different influences activate respective initial nodes to start a propagation process; different influences can select the same node as an initial node;

wherein, h (u, I)_i) Representing node u adoption impact I_iIs the set of all influence components, NA (u, I)_i) Indicating that node u is activated and to which impact I is propagated_iThe neighbor node of (2);

finally, the propagation process terminates when no new nodes are activated.

4. The initial node selection method of claim 1,

t in step (2)₀The samples are data obtained by sampling in a single reverse direction, and comprise a node set R, a set of distances between each node w and other nodes in the node set R { d (u) | u ∈ R }, a set of activated node sets of each node in the node set R { NA (u) | u ∈ R }, a node w and a first initial node set S_CDistance d of_CAnd node w is set by the first initial node S_CTotal number of activation times Time of the intermediate nodes_C。

5. The method for selecting an initial node according to any one of claims 1 to 4, wherein the process of sampling in a single inverse direction in step (2) comprises the following sub-steps:

(2-7) setting the counter c1 ═ c1+1, and returning to step (2-4);

(2-8) judging the node set V and the first initial node set S_CWhether there is an intersection or whether the node set V is an empty set, if soIf yes, calculating and recording node w by node set V and S_CTotal Time of activation affected by competition propagated by nodes in the intersection_CAnd go to step (2-17), otherwise put all nodes in the node set V into the node set R, and go to step (2-9);

(2-9) setting counter c2 ═ 1, and current distance curDist ═ curDist + 1;

(2-15) judging the node set V_nextWith a first set of initial nodes S_CWhether there is an intersection or notOr a set of nodes V_nextWhether the node is an empty set or not, if so, acquiring a node w and collecting a node set V_nextAnd S_CTotal Time of activation affected by competition propagated by nodes in the intersection_CAnd then the step (2-17) is carried out, otherwise, the step (2-16) is carried out;

6. The initial node selection method according to claim 5, wherein the step (3) of iteratively selecting k nodes with the largest edge profit as the initial node set comprises the following substeps:

(3-1) setting

SUM of adopted SUM of above₁＝0；

(3-2) setting counter c5 ═ 1;

(3-3) from the sample set

Taking out the c5 th sample R_c5Setting counter c6 to 1;

Updating w_c6In a sample set

Is the node w_c6In a sample set

Total adoption of (1) plus

Then, the step (3-5) is carried out; wherein NA (w)_c6) And Time_CAre respectively representative of samples R_c5Active node set and first initial node set S in (1)_CImpact activation samples R propagated by middle nodes_c5The total number of nodes w in (1);

(3-6) judging whether or not the counter c5 is smaller than

The total adopted sizes of the above are arranged in descending order;

Then, the step (3-22) is carried out;

(3-9) from the sample set

Take out all contained nodes w_c4Forming a sample set

And sets counter c7 to 1;

(3-10) taking out

(3-11) determination of w_c4And sample R_c7Distance d (w) of node w_c4) Whether or not less than R_c7Initial node set S in (1)_CHejie (Chinese character)Distance d of point w_CIf so, the sample R is marked_c7If the determination is adopted, the step (3-12) is carried out, otherwise, the step (3-15) is carried out;

(3-12) setting counter c8 to 1;

(3-13) taking out sample R_c7C8 th node in (1), update w_c8In a sample set

Is the node w_c8In a sample set

Total sum of above minus node w_c8At R_c7And update the node w_c8Sorting in a big root heap;

(3-15) setting counter c9 to 1;

Is the node w_c9In a sample set

Total sum of above minus node w_c4At R_c7The above steps are adopted;

(3-18) updating node w_c9In a sample set

Is the node w_c9In a sample set

Wherein

Is the node w_c9In a sample set

Total adoption of (1) plus node w_c9At R_c7Adopting after updating;

(3-19) updating the ordering of node wc9 in the big root heap;

(3-21) judging whether or not the counter c7 is smaller than

(3-22) judgment of SUM₁Whether or not it is greater than threshold value T₁If yes, entering the step (4), otherwise updating the sampling times T used in the step (2)₀＝2*T₀And returns to (2).

7. The method for selecting initial nodes according to claim 6, wherein the f (S) adopted by the second initial node set S is obtained by adopting the following formula:

is the number of actual samples and is,

is that

Average of all samples taken.

8. The initial node selection method according to claim 7, wherein the influence graph is reversely sampled in step (5), and the sampling A obtained in the current sampling by the second initial node set S obtained in step (3) is calculated_c10Comprises the following substeps:

(5-2) determining whether the node z is the first initial node set S_CIf d is an element in (1)_C＝curDist，Turning to the step (5-3), otherwise, turning to the step (5-3);

(5-6) judgment of d_SWhether the current sampling rate is equal to curDist or not, if so, the sampling A of the second initial node set S at this time is adopted_c10If the value is 1, the step (5-30) is carried out, otherwise, the step (5-7) is carried out;

(5-7) setting counter c11 ═ 1, current distance curDist ═ curDist + 1;

(5-11) judging the node z_c11Whether it is an element in the second set of initial nodes S,if so, d_SAs currDist, node z_c11Joining node Set_SAnd (5) turning to the step (5-12), otherwise, turning to the step (5-12);

(5-12) entering a step (5-8) when the counter c11 is c11+ 1;

Wherein

(5-20) judging whether the value of c13 is larger than the node z_c12The total number of in-neighbor nodes of the network, if so,then, the step (5-25) is carried out, otherwise, the step (5-21) is carried out;

(5-24) the counter c13 ═ c13+1, and return to step (5-20);

(5-25) the counter c12 ═ c12+1, and return to step (5-18);

Then, the process proceeds to step (5-30), wherein

(5-29) judgment of d_SWhether the current sampling rate is equal to curDist or not, if so, the sampling A of the second initial node set S at this time is adopted_c10Turning to the step (5-30) if the value is 1, otherwise, directly turning to the step (5-30);

(5-30) returning to the sampling A_c10。

9. An initial node election system that achieves impact maximization, comprising:

Wherein the function

n represents the total number of nodes in the impact graph;

a third module for returning the sample set according to the second module

Wherein 0<k<n；

A fourth module, configured to set the counter c10 to be 0, and set the SUM total used for the second initial node set S obtained by the third module₂＝0；

A sixth module for determining whether counter c10 is less than threshold T₂And SUM of total adoption of the second set of initial nodes S₂Whether or not less than threshold value T₃If yes, c10 is set to c10+1, and the method returns to the fifth module, otherwise, the adopted unbiased estimation of the second initial node set S is output

Then entering a seventh module, wherein n represents the total number of nodes in the impact graph;

Whether or not to satisfy

If so, the second set of initial nodes S is directly output as a result, the process ends,otherwise, updating the sampling times T used by the second module₀＝2*T₀And returning to the second module, where ε₁The accuracy parameter is represented.