CN104616200B

CN104616200B - A kind of maximizing influence start node choosing method based on nodal properties

Info

Publication number: CN104616200B
Application number: CN201510072839.0A
Authority: CN
Inventors: 邓晓衡; 潘琰; 曹德娟; 朱从旭; 林立新; 沈海澜; 李登
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2015-02-11
Filing date: 2015-02-11
Publication date: 2017-10-10
Anticipated expiration: 2035-02-11
Also published as: CN104616200A

Abstract

The present invention proposes a kind of online community network maximizing influence start node choosing method based on nodal properties.First in a network, based on user activity, user's susceptibility and user's cohesion three aspect factor, nodal properties are evaluated, and the credit value between node is redefined and distributed on this basis, credit value size between node embodies the influence power between node, if two adjacent nodes perform identical behavior in succession, then think that the latter is influenceed by the former, credit is distributed for the former, we combine network structure and User action log afterwards, credit value size in calculating network between any two node, and pass through greedy algorithm, recurrence chooses the maximum node composition maximizing influence start node set of marginal benefit.Present invention improves in the past only in accordance with node angle value evaluation node influence power it is regular the drawbacks of, reduce operation time and memory consumption, can describe more authentic and validly and predicted impact power communication process.

Description

A kind of maximizing influence start node choosing method based on nodal properties

Technical field

The invention belongs to field of computer technology, it is related to a kind of maximizing influence start node choosing based on nodal properties Take method.

Background technology

The development of internet is not only that we bring convenient life style, us the mode of exchanges and communication is occurred Huge change.We make friends becomes more to enrich many with sharing the approach of wisdom also with the development of online community network Sample.As increasing people is using the more easily data exchange service such as mobile terminal, our social structure and Social relation network becomes more complicated and close.Generally, we use graph structure to the person to person in social groups Between relation be modeled, node on behalf individual, and side or arc represent the relation between individual.Pass through online community network Relation between middle user, information can be propagated with minimum cost at a terrific speed, and Just because of this, influence power exists Propagation in community network and it is distributed as viral marketing and brings unprecedented opportunities and challenge, how finds initial user Colony causes the influence spread scope of information finally is maximum to turn into one of hot research field.

For maximizing influence problem, the research work of Most current is all based on cascading traditional classical influence power The optimization of model, or the degree of accuracy of heuritic approach is improved, the assessment for influence power is then based primarily upon network knot Structure and node angle value, the behavioral similarity between the characteristic of user itself and user and user are seldom mined and are applied to pair In the assessment of node influence power.

For above-mentioned deficiency, it is proposed that a kind of evaluation scheme for node initial effects power, this method is combined User personality, behavior similarity is commented the influence power between node between incidence relation and user between user and behavior Valency.Meanwhile, we are allocated according to the influence power evaluation criterion between the node proposed to credit, and are obtained with reference to greedy algorithm Node set is maximized to initial effects power.

The content of the invention

The present invention proposes a kind of more authentic and valid maximizing influence start node based on nodal properties and chosen Method, combines user activity, user's susceptibility and user's cohesion are to section between evaluation node during influence power Influence power between point is evaluated, and according to the user force size between time response calculate node, and combines network Structure and User action log are distributed to credit and the communication process of influence power is built, and are finally combined greedy algorithm and are chosen side The node of border Income Maximum obtains initial effects power and maximizes node set.Comprise the following steps that：

Step 1：Online community network data set is handled, real User action log and network structure text is obtained Part；

Step 2：Traverse user user behaviors log, to each node in network, calculates user activity, Yong Humin respectively Sensitivity and user's cohesion, to u node, user activity act (u) is defined as：

The number of the behavior of node u execution is represented,Represent what node u was influenceed by neighbor node and passively performed Behavior number,The behavior total number of training centralized recording is represented, parameter lambda is controlled to two kinds of behavior quantitative indexes, value Scope is (0,1), user's susceptibilityIt is defined as follows：

At the time of recording in node u all neighbor nodes process performing a first, t_u(a) node u finally quilts are represented At the time of influenceing and perform identical behavior a, τ_uRepresent the average delay time between node u and its neighbor node；When two The time span at quarterIt is longer,Value just smaller, user's cohesion p_{V, u}Calculation formula is as follows：

Node u and node v process performing species union of sets collection are represented,Represent that node u and node v performs row For species intersection of sets collection；

Step 3：User's susceptibility and user's cohesion are normalized respectively, the node u average sensitivity of user DegreeWith the average cohesion of userComputation rule is as follows：

The behavior of training centralized recording is represented,N (u) expression nodes u adjacent node set, v ∈ N (u), just Beginning user force is defined as：

Given average delay time τ between node u and its neighbor node v_{V, u}With node u initial user influence power Iniful (u), enters line translation to the influence power between adjacent node v and node u, calculation formula is such as using continuous attenuation function Under：

Node v is represented to user forces of the node u for behavior a,It is node v process performings a At the time of, N_out(u) represent that node v's goes out neighbor node set, u ∈ N_out(v), t_u(a) node u is represented finally to be affected and hold At the time of row identical behavior a；

Step 4：Node v is distributed in definition allows it to influence node u credit, for arbitrary two node v and node u, Giving node v allows it to influence node u total credit to be defined as：

Wherein, N_in(u) represent that node u's enters neighbor node set, node w enters neighbours, w ∈ N for node u's_in(u), γ_{W, u}(a) represent and give node w and allow it to influence adjacent node u direct credit, formula represent for arbitrary two node v and Node u, give node v allow its influence node u total credit be equal to using node u it is all enter neighbours as intermediate node, give and save Point v allows it to influence the node u credit sum of products, makes γ_{W, u}(a) it is equal to node w to customer impacts of the node u for behavior a Power, i.e.,Similarly, giving a node set S allows it to influence node u total credit calculation formula It is as follows：

Step 5：Daily record is propagated by traverse user behavior, credit assignment is inversely carried out along behavior propagation path, is calculated Marginal benefits of the node x for all behaviors：

σ_cd(x) it is the influence power propagation function for node x, S is current start node set, V represents all in network Total collection of node,It is allowed to influence node u credit to give node x in node set V-S by behavior a,To give total credit that node v allows its other nodes of the influence in addition to current start node set S, Γ_{S, x}(a) represent for behavior a, give the S of current start node set credit value, the credit value of node is higher to represent influence Power is bigger, and the maximum node insertion start node set S of marginal benefit is chosen with reference to greedy algorithm recurrence；

Step 6：Judge whether the number of element in start node set has reached the number k of requirement, if reached Arrive, then obtain final start node set, if not up to, between the node in addition to current start node set Credit distribution is updated, and comes back to step 5.

The present invention is a kind of selection evaluation scheme of the maximizing influence start node based on nodal properties, this method User personality is combined, behavior similarity more truly evaluation node between incidence relation and user between user and behavior Between influence power.Meanwhile, the evaluation criterion of influence power and credit value is improved according to time response, meet node it Between the characteristic that decays with the time of influence power, and combine network structure and User action log credit is distributed and influence power biography The process of broadcasting is built, and by learning the real behavior record of user, the inventive method can more accurately and effectively be chosen just Beginning node, and obtain more preferable influence power communication effect.

Brief description of the drawings

Fig. 1 is a kind of maximizing influence start node choosing method flow based on nodal properties proposed by the present invention Figure；

Fig. 2 is a kind of key of maximizing influence start node choosing method based on nodal properties proposed by the present invention Step；

Fig. 3 is that different methods in embodiment 1 choose consumed run time comparison diagram for start node, and left figure is 4 kinds of methods run time experimental result in 0 to 160 minutes spans is contrasted, right figure be 2 kinds of methods 1 to 3 minutes when Between run time experimental result contrast in span；

Fig. 4 is that different methods in embodiment 1 choose consumed memory headroom comparison diagram for start node, and left figure is 4 kinds of methods are 2 kinds of methods in 0 sky for arriving 50MB in the 0 experimental result contrast that internal memory is consumed into 100MB spatial extents, right figure Between in span consumption internal memory experimental result contrast；

Fig. 5 is influence power communication effect comparison diagram of the different methods in embodiment 2 for start node set, and left figure is The experimental result curve representation that 4 kinds of methods are propagated for the start node set influence power of selection, right figure is 2 kinds of methods pair In the histogram graph representation form of the start node set influence power propagation experimentation result of selection.

Fig. 6 is that the influence power in embodiment 2 for test set user behavior predicts the outcome comparison diagram, and left figure is 2 kinds of methods The scatter diagram representation predicted the outcome to test set behavioral implications power, right figure is that 2 kinds of methods predict the outcome to behavioral implications power Histogram graph representation form.

Embodiment

Below in conjunction with accompanying drawing, theory analysis and emulation experiment, the present invention is described in further detail.

Community network is configured to a non-directed graph G=(V, E) by the present invention, and wherein V represents the set of all nodes in figure, E represents the set of relation between Different Individual, and S represents start node set, and σ (S) represents influence power propagation function, in the present invention The middle size that influence power is presented as to the credit value for assigning node, the allocated higher influence power for representing node of credit value of node It is bigger, so σ_cd(S) it is defined as giving total credit that present node set S allows it to influence remaining node, i.e.,Maximizing influence problem is exactly to find the start node set that number is k so that finally whole The desired value of the node number successfully influenceed in network is maximum.

The weights of node embody the specificity of node.On the one hand, different individuals often have different hobby and not With preference orientation, when different people faces identical things, if receive the things depend on individual in it Evaluation and cognitive acceptance threshold size.On the other hand, each individual is also different to other people influence power, for example, one As in the case of, the recommendation from friend or relatives can produce stronger influence power compared to stranger, our in-service evaluations Intimate degree or trusting degree between body embody this characteristic.The inventive method proposes one kind by information in live network It is middle to propagate record to create and simulate behavior communication process, in combination with user activity, user's susceptibility and user's cohesion 3 Aspect is estimated to the influence power between adjacent node.Fig. 1 and Fig. 2 are respectively proposed by the present invention a kind of based on nodal properties Maximizing influence start node choosing method flow chart and committed step, specific implementation step is as follows.

Step 2：Traverse user user behaviors log, to each node in network, calculates user activity, user respectively Susceptibility and user's cohesion, to u node, user activity act (u) is defined as：

Node v is represented to user forces of the node u for behavior a,It is node v process performings a At the time of, N_out(v) represent that node v's goes out neighbor node set, u ∈ N_out(v), t_u(a) node u is represented finally to be affected and hold At the time of row identical behavior a；

Wherein, N_in(u) represent that node u's enters neighbor node set, node w enters neighbours, w ∈ N for node u's_in(u), γ_{W, u}（A) represent and give node w and allow it to influence adjacent node u direct credit, formula represent for arbitrary two node v and Node u, give node v allow its influence node u total credit be equal to using node u it is all enter neighbours as intermediate node, give and save Point v allows it to influence the node u credit sum of products, makes γ_{W, u}(a) it is equal to node w to customer impacts of the node u for behavior a Power, i.e.,Similarly, giving a node set S allows it to influence node u total credit calculation formula It is as follows：

In order to verify effectiveness of the invention, below just according to the marginal benefit of influence power function pair node carry out theory push away Lead analysis.

Giving a node set S allows it to influence node u total credit Γ_{S, u}(a) it is equal to and gives set S and allow it first to influence All node u's directly enters neighbor node w, then influences node u credit sum, that is, is expressed as：

The node u neighborhood that enters is expressed as N_in(u), node w enters neighbours, w ∈ N for node u_in(u) ∩ w ∈ S, Γ_{S, w}(a) it is to give set S for behavior a to allow it to influence w credit, γ_{W, u}(a) it is allowed to influence immediate neighbor node to give w U credit.

, it is necessary to which other in network are saved after start node set S being inserted into as a node x because of marginal benefit maximum Point credit distribution is updated because node x is elected as after start node, initial time he by as the transmission of information Person, and the propagation of the influence power as relay function is shielded, so calculating any two points in addition to current start node set S Between credit distribution when need to subtract the credit value of all path allocations by node x

Then node x is inserted after initial sets S, the difference of credit is

Since it is known node x is not in original start node set, so credit difference is to give x node to give Node x allows it to influence node u credit and total credit that the distribution path without node x gives S only poor, i.e.,

Can be by according to formula (2)Abbreviation isBecause behavior is passed Broadcast path and obey temporal characteristicses, it is impossible to there is loop, thus node v can not possibly in node x and node u distribution path, ThereforeIt is by formula (4) abbreviation

Marginal benefit calculation formula of the node x for all behaviors can be obtained according to formula (5)：

Extract on the right of formulaCan using abbreviation as Line translation is entered into parameter position in formula, can be obtained

From above-mentioned theory analyze, calculate node x marginal benefit need to only calculate give node x allow its influence except currently Total credit of other nodes outside start node set SAnd for behavior a, give current initial The S of node set credit value Γ_{S, x}(a) accuracy and authenticity of the inventive method, are demonstrated.

In an experiment, the data set that we obtain is from real photo sharing website Flickr collections, total data set Include 105938 photos.4 parts are divided into according to the source of photo, we select one of them as experimental subjects, bag Containing 2602 nodes and 222292 sides and 24648 photos.Because time-constrain and network structure are obeyed in credit distribution, We are handled initial data, obtain two files, the wherein incidence relation between the author of map file recording photograph, are used Family user behaviors log file is included to be recorded with the user behavior of sequence of event.User behavior record is divided into two by us again Point, i.e., the training set user behavior comprising 2724 kinds of behaviors records and included the test set user behavior record of 1816 kinds of behaviors. 75269 records are included in User action log file.

Embodiment 1：

In this embodiment, we make parameter lambda=0.5, the purpose for the arrangement is that for the row of balanced node u active initiations For with the proportion shared by the behavior for being influenceed and passively performing by neighbor node.

From the figure 3, it may be seen that the run time needed for start node is chosen by the inventive method (CD2) is slightly above in tradition letter With the selection on distributed model (CD) to node, the run time distribution recorded from figure is as can be seen that with start node Increase, the growth trend of run time is linear, and is considerably less than in independent cascade model (IC) and linear threshold model (LT) the execution time of equivalent amount start node is chosen on.Aspect has the results show the inventive method at runtime High efficiency and scalability.

As shown in Figure 4, with the increase of initial sets node, the inventive method (CD2) is chosen for start node to be disappeared The memory headroom of consumption is slightly above traditional credit distributed model (CD), because adding assessment and the customer impact of nodal properties The evaluation work of power, but memory headroom consumption will be less than independent in the case where choosing equivalent amount start node for the inventive method Cascade model (IC) and linear threshold model (LT), this advantage increases and more obvious with what start node was chosen.It is real Test result to show, the work chosen for maximizing influence start node, the inventive method is either in terms of time operation Or higher advantage and efficiency is shown in terms of internal memory space consuming.

Embodiment 2：

As shown in Figure 5, this experimental method (CD2) is compared has stronger description and biography to influence power with conventional model (CD) Broadcast ability.Compared to independent cascade model (IC) and linear threshold model (LT), the inventive method another advantage is that it It is that the real behavior record of user is learnt, and combines user characteristics rather than only only in accordance with network structure to customer impact Power is evaluated, so can more truly reflect that user behavior and influence power are propagated, with higher authenticity and reliably Property.

Contrast carries out influence power biography using two methods of the inventive method (CD2) and CD for the behavior of test centralized recording Prediction is broadcast, test set includes all 1816 kinds of behaviors, and we propagate result to difference according to real influence power after experiment terminates Behavior be ranked up, and experiment predicted the outcome contrasted with actual value.As shown in fig. 6, the inventive method and tradition side Method is below real influence power propagation values to the influence power propagation forecast result of test set behavior sample, but from comparing result As can be seen that compared to conventional method (CD), the inventive method has a certain degree of optimization to user's behavior prediction effect and carried Rise, and with higher influence power prediction accuracy.

It was found from being tested more than, no matter the inventive method shows height in terms of run time or memory headroom consumption The characteristic of effect, propagates record by learning real behavior, can more truly reflect the propagation of user behavior and influence power, remove Outside this, experiment proves that selection of the inventive method to start node has higher accuracy and reliability.

Claims

1. a kind of maximizing influence start node choosing method based on nodal properties comprises the following steps：

Step 1：Online community network data set is handled, real User action log and network structure file is obtained；

Step 2：Traverse user user behaviors log, to each node in network, calculates user activity, user's susceptibility respectively With user's cohesion, to u node, user activity act (u) is defined as：

The number of the behavior of node u execution is represented,Represent the behavior that node u is influenceed by neighbor node and passively performed Number,The behavior total number of training centralized recording is represented, parameter lambda is controlled to two kinds of behavior quantitative indexes, span For (0,1), user's susceptibilityIt is defined as follows：

At the time of recording in node u all neighbor nodes process performing a first, t_u(a) node u is represented finally to be affected And at the time of performing identical behavior a, τ_uRepresent the average delay time between node u and its neighbor node；When two moment Time spanIt is longer,Value just smaller, user's cohesion p_{V, u}Calculation formula is as follows：

Node u and node v process performing species union of sets collection are represented,Represent node u and node v process performing kinds The common factor of class set；

Step 3：User's susceptibility and user's cohesion are normalized respectively, node u user's average sensitivity With the average cohesion of userComputation rule is as follows：

<mrow> <msubsup> <mi>p</mi> <mi>u</mi> <mo>^</mo> </msubsup> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mo>|</mo> <mi>N</mi> <mrow> <mo>(</mo> <mi>u</mi> <mo>)</mo> </mrow> <mo>|</mo> </mrow> </mfrac> <munder> <mo>&Sigma;</mo> <mrow> <mi>v</mi> <mo>&Element;</mo> <mi>N</mi> <mrow> <mo>(</mo> <mi>u</mi> <mo>)</mo> </mrow> </mrow> </munder> <msub> <mi>p</mi> <mrow> <mi>v</mi> <mo>,</mo> <mi>u</mi> </mrow> </msub> </mrow>

The behavior of training centralized recording is represented,N (u) represents node u adjacent node set, and v ∈ N (u) are initial to use Family influence power is defined as：

Given average delay time τ between node u and its neighbor node v_{V, u}With node u initial user influence power iniful (u) line translation, is entered to the influence power between adjacent node v and node u using continuous attenuation function, calculation formula is as follows：

<mrow> <msubsup> <mi>&Psi;</mi> <mrow> <mi>u</mi> <mo>&Element;</mo> <msub> <mi>N</mi> <mrow> <mi>o</mi> <mi>u</mi> <mi>t</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>v</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mi>v</mi> <mo>,</mo> <mi>u</mi> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>a</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>i</mi> <mi>n</mi> <mi>i</mi> <mi>f</mi> <mi>u</mi> <mi>l</mi> <mrow> <mo>(</mo> <mi>v</mi> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <mfrac> <msup> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>u</mi> </msub> <mo>(</mo> <mi>a</mi> <mo>)</mo> <mo>-</mo> <msubsup> <mi>t</mi> <mi>v</mi> <mn>0</mn> </msubsup> <mo>(</mo> <mi>a</mi> <mo>)</mo> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mrow> <msqrt> <mrow> <mn>4</mn> <mi>&pi;</mi> </mrow> </msqrt> <mo>&CenterDot;</mo> <msub> <mi>&tau;</mi> <mrow> <mi>v</mi> <mo>,</mo> <mi>u</mi> </mrow> </msub> </mrow> </mfrac> <mo>)</mo> </mrow> </mrow>

Represent for behavior a, node v to node u user force,It is node v process performings a At the time of, N_out(v) represent that node v's goes out neighbor node set, u ∈ N_out(v), t_u(a) node u is represented finally to be affected and hold At the time of row identical behavior a；

Step 4：Node v is distributed in definition allows it to influence node u credit, for arbitrary two node v and node u, gives Node v allows it to influence node u total credit to be defined as：

<mrow> <msub> <mi>&Gamma;</mi> <mrow> <mi>v</mi> <mo>,</mo> <mi>u</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>a</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>w</mi> <mo>&Element;</mo> <msub> <mi>N</mi> <mrow> <mi>i</mi> <mi>n</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>u</mi> <mo>)</mo> </mrow> </mrow> </munder> <msub> <mi>&Gamma;</mi> <mrow> <mi>v</mi> <mo>,</mo> <mi>w</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>a</mi> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <msub> <mi>&gamma;</mi> <mrow> <mi>w</mi> <mo>,</mo> <mi>u</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>a</mi> <mo>)</mo> </mrow> </mrow>

Wherein, N_in(u) represent that node u's enters neighbor node set, node w enters neighbours, w ∈ N for node u's_in(u), γ_{W, u}(a) Represent and give the direct credit that node w allows its influence adjacent node u, formula is represented for arbitrary two node v and node u, Give node v allow its influence node u total credit be equal to using node u it is all enter neighbours as intermediate node, give node v and allow it Influence the node u credit sum of products, γ_{W, u}(a) represent for behavior a, give node w and allow it to influence adjacent node u's Direct credit, its value is equal to user forces of the node w to node u, i.e.,Similarly, one is given Individual node set S allows it to influence node u total credit calculation formula as follows：

<mrow> <msub> <mi>&Gamma;</mi> <mrow> <mi>s</mi> <mo>,</mo> <mi>u</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>a</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>w</mi> <mo>&Element;</mo> <msub> <mi>N</mi> <mrow> <mi>i</mi> <mi>n</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>u</mi> <mo>)</mo> </mrow> <mo>&cap;</mo> <mi>w</mi> <mo>&NotElement;</mo> <mi>S</mi> </mrow> </munder> <msub> <mi>&Gamma;</mi> <mrow> <mi>s</mi> <mo>,</mo> <mi>w</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>a</mi> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <msub> <mi>&gamma;</mi> <mrow> <mi>w</mi> <mo>,</mo> <mi>u</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>a</mi> <mo>)</mo> </mrow> </mrow>

Step 5：Daily record is propagated by traverse user behavior, credit assignment, calculate node x are inversely carried out along behavior propagation path For the marginal benefit of all behaviors：

σ_cd(x) it is the influence power propagation function for node x, S is current start node set, V represents all nodes in network Total collection,It is allowed to influence node u credit to give node x in node set V-S by behavior a,To give total credit that node v allows its other nodes of the influence in addition to current start node set S, Г_{S, x}(a) represent for behavior a, give the S of current start node set credit value, the credit value of node is higher to represent influence Power is bigger, and the maximum node insertion start node set S of marginal benefit is chosen with reference to greedy algorithm recurrence；

Step 6：Judge whether the number of element in start node set has reached the number k of requirement, if it has been reached, then Final start node set is obtained, if not up to, to the credit between the node in addition to current start node set Distribution is updated, and comes back to step 5.