CN104616200A

CN104616200A - Influence maximization initial node selecting method based on node characteristics

Info

Publication number: CN104616200A
Application number: CN201510072839.0A
Authority: CN
Inventors: 邓晓衡; 潘琰; 朱从旭; 林立新; 沈海澜; 李登
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2015-02-11
Filing date: 2015-02-11
Publication date: 2015-05-13
Anticipated expiration: 2035-02-11
Also published as: CN104616200B

Abstract

The invention provides an online social network influence maximization initial node selecting method based on node characteristics. The method includes first evaluating the node characteristics in the network based on user activeness, user sensitiveness and user intimacy, redefining and distributing credit value between the nodes based on the three factors, wherein the credit value between the nodes shown the influence between the nodes, if two adjacent nodes have identical behavior in succession, and the latter is deemed to be influenced by the former and distributes credit for the former. The network structure and the user behavior log are combined to calculate the credit value between any two nodes, then a greedy algorithm is utilized to select the nodes with highest marginal benefit in a recursion mode to form the influence maximization initial node set. The method overcomes the shortcoming that the node influence is evaluated only according to the node degree value, operation time and internal storage consumption are reduced, and the influence spreading process can be described and predicted truly and effectively.

Description

A kind of maximizing influence start node choosing method based on nodal properties

Technical field

The invention belongs to field of computer technology, relate to a kind of maximizing influence start node choosing method based on nodal properties.

Background technology

The development of internet is not only us and brings convenient life style, also makes the mode of our exchanges and communication there occurs huge change.We make friends and share the approach of wisdom also along with the development of online community network becomes more rich and varied.Along with increasing people uses the data exchange service more easily such as such as mobile terminal, our social structure and social relation network become more complicated and tight.Generally, we use graph structure to carry out modeling to the interpersonal relation in social groups, and node on behalf is individual, and limit or arc represent the relation between individuality.By the relation in online community network between user, information can be propagated with minimum cost at a terrific speed, Just because of this, the propagation of influence power in community network and be distributed as viral marketing and bring unprecedented opportunities and challenge, spread scope is maximum has become one of hot research field for affecting of how finding initial user colony to make information final.

For maximizing influence problem, the research work of Most current is all based on the optimization to traditional classical influence power cascade model, or the accuracy of heuritic approach is improved, for assessment then main structure Network Based and the node angle value of influence power, the characteristic of user self and the behavioral similarity between user and user are seldom excavated and are applied in the assessment to node influence power.

For above-mentioned deficiency, we propose a kind of evaluation scheme for node initial effects power, this methods combining user personality, and between user and behavior, between incidence relation and user, behavior similarity is evaluated the influence power between node.Meanwhile, we distribute credit according to the influence power evaluation criterion between the node proposed, and obtain initial effects power maximization node set in conjunction with greedy algorithm.

Summary of the invention

The present invention proposes a kind of more authentic and valid maximizing influence start node choosing method based on nodal properties, between evaluation node influence power process in conjunction with user's liveness, user's susceptibility and user's cohesion are evaluated the influence power between node, according to the user force size between time response computing node, and in conjunction with network structure and User action log, the communication process to credit distribution and influence power builds, and finally chooses the maximum node of marginal income in conjunction with greedy algorithm and obtains initial effects power maximization node set.Concrete steps are as follows:

Step 1: process online community network data set, obtains real User action log and network structure file;

Step 2: traverse user user behaviors log, to each node in network, calculate user's liveness respectively, user's susceptibility and user's cohesion, to u node, user's liveness act (u) is defined as:

the number of the behavior that representation node u performs, representation node u is subject to neighbor node to be affected and the behavior number of passive execution, the total number of behavior of representative training centralized recording, parameter lambda controls two kinds of behavior quantitative indexes, and span is (0,1), user's susceptibility be defined as follows:

record the moment of act of execution a first in all neighbor nodes of node u, t _ua () representation node u is finally affected and performs the moment of identical behavior a, τ _uaverage delay time between representation node u and its neighbor node; When the time span in two moment it is longer, value less, user cohesion p _{v, u}computing formula is as follows:

represent node u and node v act of execution kind union of sets collection, represent node u and node v act of execution kind intersection of sets collection;

Step 3: respectively user's susceptibility and user's cohesion are normalized, user's average sensitivity of node u cohesion average with user computation rule is as follows:

p_{u}^{^} = \frac{1}{| N (u) |} \underset{v &Element; N (u)}{Σ} p_{v, u}

the behavior of representative training centralized recording, n (u) represents the adjacent node set of node u, v ∈ N (u), and initial user influence power is defined as:

τ average delay time between given node u and its neighbor node v _{v, u}with initial user influence power iniful (u) of node u, use continuous attenuation function to convert the influence power between adjacent node v and node u, computing formula is as follows:

ψ_{{u &Element; N}_{out} (v)}^{v, u} (a) = iniful (v) \cdot \exp (- \frac{{(t_{u} (a) - t_{v}^{0} (a))}^{2}}{\sqrt{4 π} \cdot τ_{v, u}})

representation node v to the user force of node u for behavior a, the moment of node v act of execution a, N _outu () represents that node v's goes out neighbor node set, u ∈ N _out(v), t _ua () representation node u is finally affected and performs the moment of identical behavior a;

Step 4: node v is distributed in definition allows it affect the credit of node u, for arbitrary two node v and node u, gives node v and allows its total credit affecting node u be defined as:

Γ_{v, u} (a) = \underset{w &Element; N_{in} (u)}{Σ} Γ_{v, w} (a) \cdot γ_{w, u} (a)

Wherein, N _inu () represents that node u's enters neighbor node set, node w be node u enter neighbours, w ∈ N _in(u), γ _{w, u}a () representative gives node w allows it affect the direct credit of adjacent node u, formula represents for arbitrary two node v and node u, give node v allow its total credit affecting node u equal with node u all enter neighbours for intermediate node, giving node v allows it affect the credit sum of products of node u, makes γ _{w, u}a () equals node w to the user force of node u for behavior a, namely similarly, total credit computing formula that a node set S allows it affect node u is given as follows:

Γ_{S, u} (a) = \underset{w &Element; N_{in} (u) \cap w &Element; S}{Σ} Γ_{S, w} (a) \cdot γ_{w, u} (a)

Step 5: propagate daily record by traverse user behavior, along behavior travel path counter movement credit assignment, computing node x is for the marginal income of all behaviors:

σ_{cd} (S + x) - σ_{cd} (S) = \underset{u &Element; V}{Σ} \frac{1}{A_{u}} \underset{a &Element; A_{u}}{Σ} (Γ_{S + x, u} (a) - Γ_{S, u} (a)) = \underset{a &Element; A_{u}}{Σ} {(1 - Γ_{S, x} (a)) \cdot \underset{u &Element; V}{Σ} (\frac{1}{A_{u}} \cdot Γ_{x, u}^{V - S} (a))}

σ _cdx () is the influence power propagator for node x, S is current start node set, and V represents the general collection of all nodes in network, it is allowed to affect the credit of node u for being given node x in node set V-S by behavior a, for giving total credit that node v allows other nodes of its impact except current start node S set, Γ _{s, x}a () representative, for behavior a, gives the credit value of the S of current start node set, the credit value of node is higher, and to represent influence power larger, chooses the maximum node city start node S set of marginal income in conjunction with greedy algorithm recurrence;

Step 6: judge whether the number of element in start node set meets the requirements of number k, if reached, then obtain final start node set, if do not reached, then the credit distribution between the node except current start node set is upgraded, and come back to step 5.

The present invention be a kind of maximizing influence start node based on nodal properties choose evaluation scheme, this methods combining user personality, the influence power of behavior similarity more truly between evaluation node between incidence relation and user between user and behavior.Simultaneously, improve according to the evaluation criterion of time response to influence power and credit value, meet the characteristic that influence power between node decays in time, and in conjunction with network structure and User action log, the communication process to credit distribution and influence power builds, by the real behavior record of study user, the inventive method can choose start node more accurately and effectively, and obtains better influence power communication effect.

Accompanying drawing explanation

Fig. 1 is a kind of maximizing influence start node choosing method process flow diagram based on nodal properties that the present invention proposes;

Fig. 2 is the committed step of a kind of maximizing influence start node choosing method based on nodal properties that the present invention proposes;

Fig. 3 is that in embodiment 1, diverse ways chooses consumed comparison diagram working time for start node, left figure is experimental result contrasts working time in 0 to 160 minutes spans of 4 kinds of methods, and right figure is experimental result contrasts working time in the time span of 1 to 3 minutes of 2 kinds of methods;

Fig. 4 is that in embodiment 1, diverse ways chooses consumed memory headroom comparison diagram for start node, left figure is the experimental result contrast that 4 kinds of methods consume internal memory in 0 to 100MB spatial extent, and right figure is the experimental result contrast that 2 kinds of methods consume internal memory in 0 spatial extent to 50MB;

Fig. 5 be in embodiment 2 diverse ways for the influence power communication effect comparison diagram of start node set, left figure is the experimental result curve representation that 4 kinds of methods are propagated for the start node set influence power chosen, and right figure is the histogram graph representation forms of 2 kinds of methods for the start node set influence power propagation experimentation result chosen.

Fig. 6 to predict the outcome comparison diagram for the influence power of test set user behavior in embodiment 2, left figure is the scatter diagram representation that 2 kinds of methods predict the outcome to test set behavioral implications power, and right figure is the histogram graph representation form that 2 kinds of methods predict the outcome to behavioral implications power.

Embodiment

Below in conjunction with accompanying drawing, theoretical analysis and emulation experiment, the present invention is described in further detail.

Community network is configured to a non-directed graph G=(V by the present invention, E), wherein V represents the set of all nodes in figure, E represents the set of relation between Different Individual, S represents start node set, and σ (S) represents influence power propagator, in the present invention influence power is presented as the size of the credit value giving node, the influence power of the higher representation node of the credit value that node is assigned with is larger, so σ _cd(S) be defined as and give present node S set and allow it affect total credit of all the other nodes, namely maximizing influence problem is exactly find number to be the start node set of k, makes final maximum by the expectation value of the node number successfully affected in the entire network.

The weights of node embody the specificity of node.On the one hand, different individualities often has different hobbies and different preference orientations, when different people is in the face of identical things time, whether accepts this things and depends on individuality the evaluation of its inherence and cognitive acceptance threshold size.On the other hand, each individuality is also different to other people influence power, such as, generally, recommendation from friend or relatives can produce stronger influence power compared to stranger, and the intimate degree between our in-service evaluation individuality or trusting degree embody this characteristic.The inventive method proposes one and in live network, propagates record create by information and simulate behavior communication process, assesses in conjunction with user's liveness, user's susceptibility and user's cohesion 3 aspect to the influence power between adjacent node simultaneously.Fig. 1 and Fig. 2 is respectively a kind of maximizing influence start node choosing method process flow diagram based on nodal properties and the committed step of the present invention's proposition, and concrete implementation step is as follows.

p_{u}^{^} = \frac{1}{| N (u) |} \underset{v &Element; N (u)}{Σ} p_{v, u}

ψ_{{u &Element; N}_{out} (v)}^{v, u} (a) = iniful (v) \cdot \exp (- \frac{{(t_{u} (a) - t_{v}^{0} (a))}^{2}}{\sqrt{4 π} \cdot τ_{v, u}})

representation node v to the user force of node u for behavior a, the moment of node v act of execution a, N _outv () represents that node v's goes out neighbor node set, u ∈ N _out(v), t _ua () representation node u is finally affected and performs the moment of identical behavior a;

Γ_{v, u} (a) = \underset{w &Element; N_{in} (u)}{Σ} Γ_{v, w} (a) \cdot γ_{w, u} (a)

Γ_{S, u} (a) = \underset{w &Element; N_{in} (u) \cap w &Element; S}{Σ} Γ_{S, w} (a) \cdot γ_{w, u} (a)

σ_{cd} (S + x) - σ_{cd} (S) = \underset{u &Element; V}{Σ} \frac{1}{A_{u}} \underset{a &Element; A_{u}}{Σ} (Γ_{S + x, u} (a) - Γ_{S, u} (a)) = \underset{a &Element; A_{u}}{Σ} {(1 - Γ_{S, x} (a)) \cdot \underset{u &Element; V}{Σ} (\frac{1}{A_{u}} \cdot Γ_{x, u}^{V - S} (a))}

In order to verify validity of the present invention, below just according to influence power function, theory deduction analysis is carried out to the marginal income of node.

Giving a node set S allows it affect total credit Γ of node u _{s, u}a () equals to give that S set allows it first affect all node u directly enters neighbor node w, then affect the credit sum of node u, is namely expressed as:

Γ_{S, u} (a) = Σ_{w &Element; N_{in} (u) \cap &Element; S} Γ_{S, w} (a) \cdot γ_{w, u} (a) - - - (1)

The neighborhood that enters of node u is expressed as N _in(u), node w be node u enter neighbours, w ∈ N _in(u) ∩ w ∈ S, Γ _{s, w}a () allows it affect the credit of w for giving S set for behavior a, γ _{w, u}a () allows it affect the credit of immediate neighbor node u for giving w.

After a node x is inserted into start node S set because marginal income is maximum, need to upgrade the credit distribution of other nodes in network, because after electing node x as start node, initial time he by the sender as information, and shielding is as the propagation of the influence power of relay function, so need when calculating the distribution of the credit except current start node S set between any two points the credit value deducting all path allocation by node x

Γ_{v, u}^{V - S - x} (a) = Γ_{v, u}^{V - S} (a) - Γ_{v, x}^{V - S} (a) \cdot γ_{x, u}^{V - S} (a) - - - (2)

After then node x being inserted initial sets S, the difference of credit is

Γ_{S + x, u} (a) - Γ_{S, u} (a) = Σ_{v &Element; S + x} Γ_{v, u}^{V - S - x + v} (a) - Σ_{v &Element; S} Γ_{v, u}^{V - S + v} (a) - - - (3)

Since it is known node x is not in original start node set, so credit difference is the node giving x is that to give total credit that credit that node x allows it affect node u and the dispense path without node x give S only poor, namely

Γ_{S + x, u} (a) - Γ_{S, u} (a) = Γ_{x, u}^{V - S} (a) - Σ_{v &Element; S} (Γ_{v, u}^{V - S + v} (a) - Γ_{v, u}^{V - S - x + v} (a)) - - - (4)

Can be by according to formula (2) abbreviation is because behavior travel path obeys temporal characteristics, can not loop be there is, so node v can not in the dispense path of node x and node u, therefore by formula (4) abbreviation be

Γ_{S + x, u} (a) - Γ_{S, u} (a) = Γ_{x, u}^{V - S} (a) - Γ_{x, u}^{V - S} (a) \cdot Σ_{v &Element; S} (Γ_{v, x}^{V - S + v} (a)) - - - (5)

The marginal income computing formula of node x for all behaviors can be obtained according to formula (5):

σ_{cd} (S + x) - σ_{cd} (S) = Σ_{u &Element; V} \frac{1}{A_{u}} Σ_{a &Element; A_{u}} (Γ_{x, u}^{V - S} (a) - Γ_{x, u}^{V - S} (a) \cdot Σ_{v &Element; S} (Γ_{v, x}^{V - S + v} (a))) - - - (6)

Extract on the right of formula can abbreviation be parameter position in formula is converted, can obtain

σ_{cd} (S + x) - σ_{cd} (S) = Σ_{a &Element; A_{u}} {(1 - Γ_{S, s} (a)) \cdot Σ_{u &Element; V} (\frac{1}{A_{u}} \cdot Γ_{x, u}^{V - S} (a))} - - - (7)

Analyzed from above-mentioned theory, the marginal income of computing node x only need calculate and give the node x total credit allowing other nodes of its impact except current start node S set and for behavior a, give the credit value Γ of the S of current start node set _{s, x}a (), demonstrates accuracy and the authenticity of the inventive method.

In an experiment, the data set that we obtain gathers from real photo sharing website Flickr, and total data collection comprises 105938 photos.Be divided into 4 parts according to the source of photo, we select one of them as experimental subjects, comprise 2602 nodes and 222292 limits and 24648 photos.Because time-constrain and network structure are obeyed in credit distribution, so we process raw data, obtain two files, the incidence relation between the author of wherein map file recording photograph, User action log file comprises the user behavior record with sequence of event.User behavior record is divided into two parts again by us, namely comprises the training set user behavior record of 2724 kinds of behaviors and comprises the test set user behavior record of 1816 kinds of behaviors.75269 records are comprised in User action log file.

Embodiment 1:

In this embodiment, we make parameter lambda=0.5, and the object done like this is the behavior initiatively initiated in order to balanced node u and be subject to neighbor node and affect and proportion shared by the behavior of passive execution.

As shown in Figure 3, working time needed for start node is chosen a little more than choosing node on traditional credit distributed model (CD) by the inventive method (CD2), distribution working time of recording as can be seen from figure, along with the increase of start node, the rising tendency of working time is linear, and is obviously less than the execution time choosing equivalent amount start node on independent cascade model (IC) and linear threshold model (LT).The results show the inventive method operationally between aspect there is high efficiency and extensibility.

As shown in Figure 4, along with the increase of initial sets node, the inventive method (CD2) chooses consumed memory headroom a little more than traditional credit distributed model (CD) for start node, this is because add the assessment of nodal properties and the evaluation work of user force, but memory headroom consumption will lower than independent cascade model (IC) and linear threshold model (LT) when choosing equivalent amount start node for the inventive method, this advantage is more obvious along with increasing of choosing of start node.Experimental result shows, for the work that maximizing influence start node is chosen, no matter the inventive method is all show higher advantage and efficiency in time operation or in memory headroom consumption.

Embodiment 2:

As shown in Figure 5, this experimental technique (CD2) is compared and is had stronger description and transmission capacity with conventional model (CD) to influence power.Compared to independent cascade model (IC) and linear threshold model (LT), another advantage of the inventive method is that it learns the real behavior record of user, and in conjunction with user characteristics instead of only according to network structure, user force is evaluated, so can reflect that user behavior and influence power are propagated more truly, there is higher authenticity and reliability.

Contrast uses the inventive method (CD2) and CD two kinds of methods to carry out influence power propagation forecast for the behavior of test centralized recording, test set comprises whole 1816 kinds of behaviors, after experiment terminates, we propagate result according to real influence power and sort to different behaviors, and experiment are predicted the outcome and to contrast with actual value.As shown in Figure 6, the inventive method and classic method to the influence power propagation forecast result of test set behavior sample all lower than real influence power propagation values, but as can be seen from the comparison result, compared to classic method (CD), the inventive method has optimization to a certain degree and lifting to user's behavior prediction effect, and has higher influence power prediction accuracy.

From above experiment, no matter the inventive method all shows efficient characteristic from working time or memory headroom consumption aspect, record is propagated by study real behavior, the propagation of user behavior and influence power can be reflected more truly, in addition, experiment proves that the inventive method has higher accuracy and reliability to choosing of start node.

Claims

1. one kind comprises the following steps based on the maximizing influence start node choosing method of nodal properties:

p_{u}^{^} = \frac{1}{| N (u) |} \underset{v &Element; N (u)}{Σ} p_{v, u}

Ψ_{{u &Element; N}_{out} (v)}^{v, u} (a) = iniful (v) \cdot \exp (- \frac{{(t_{u} (a) - t_{v}^{0} (a))}^{2}}{\sqrt{4 π} \cdot τ_{v, u}})

Γ_{v, u} (a) = \underset{{w &Element; N}_{in} (u)}{Σ} Γ_{v, w} (a) \cdot γ_{w, u} (a)

Γ_{S, u} (a) = \underset{{w &Element; N}_{in} (u) \cap w &NotElement; S}{Σ} Γ_{S, w} (a) \cdot γ_{w, u} (a)